Skip to main content

Winter School

Shaping new approaches to data management in arts and humanities

The main objective is to introduce scientific and academic communities in the arts and humanities to the principles and practices of responsible research and Open Science.

Caring for data to shape the future



Where do our commitments lie in safeguarding our collective cultural memory? How does the overwhelming and unstoppable digital revolution change our social, cultural and scholarly life and how does it lead us to rethink and reshape our priorities? In her keynote speech, Prof. Fernanda Rollo shows us how the global societal challenges and sustainable development goals of our age shape data challenges and responsibilities within the Digital Humanities. She defines 7 key areas that will directly influence new approaches to data management in the arts and humanities:

  • Awareness and training
  • Digital heritage
  • The frightening and overwhelming loss of digital heritage
  • Preservation of digital heritage
  • Organization of digital heritage
  • Collaborative work
  • A cultural change

The second half of the talk outlines how pillars of the Open Science paradigm (such as Citizen Science, training and advocacy, Open Access publishing models, infrastructures, new research metrics) are instrumental in tackling these issues.

Maria Fernanda Rollo - Caring for Data to Shape the Future

Speakerfor this session

  • Maria Fernanda Rollo

    Fernanda has a PhD in Contemporary History from the Faculdade de Ciências Sociais e Humanidades, Universidade Nova de Lisboa and is Associate Professor in the same faculty. Researcher and former President of the Institute for Contemporary History, Prof. Fernanda Rollo has coordinated research projects and has several texts published in the areas of Economic and Social History, history of Portugal, and the history of innovation and organization of science. She is a former Secretary of State for Science, Technology and Higher Education.

What is Data in the Humanities?


In the Humanities domain, we see a broad diversity of perceptions on what constitutes scholarly data and such perceptions carry tacit assumptions that could and should define domain or discipline-specific approaches to research data sharing workflows. In this workshop, participants will be encouraged to examine their own scholarly practices and those of others, refining our responses to the fundamental question: “what are Humanities research data?” After a warm-up session dedicated to epistemological reflections on the role of data within humanities research, the session introduces the participants to the basics in research data management, data services and the FAIR principles in a humanities context. The aim is to see how all these at first potentially abstract new concepts enable researchers to conduct more effective research if they are well-translated into community practices.

Clearly, the generic research data management guidelines do not always align well with the cultural, conceptual and epistemological complexity of research data in the arts and humanities and the many entailments of this complexity, such as:

  • Data in the humanities comes in a wide variety of source types, formats and corpus sizes. The word ‘data’ itself is hardly used and mostly replaced by the notion of primary sources;
  • Researchers can lack know-how as to how to deal with the various dimensions of data management: documentation, hosting, identification or re-use conditions are not part of the education curricula in the humanities;
  • A fundamental difference between the epistemic cultures of hard sciences and arts and humanities is that in the arts and humanities the wide range of scholarly information referred to as cultural heritage data are not autonomous products of research projects but are deeply embedded in the memory of the institutions (museums, libraries, archives) that preserve, curate and (co)produce them;
  • These institutions are not only data providers, as ownership of heritage data is inherently shared between them, the researcher communities, the public, and the people and cultures that give rise to the objects in question.
  • Access to and the digital availability of cultural heritage is the primary condition of research in the majority of humanities disciplines that defines the reusability and accessibility of scholarship built on them.

During the session, we are going to address these challenges one by one in the context of responsible open data sharing practices. Participants are encouraged to discuss data management issues related to their own projects (or project ideas) ideas and to contact the trainers beforehand.

Erzsébet Tóth-Czifra - What is data in the humanities?

Speakerfor this session

  • Erzsébet Tóth-Czifra

    Erzsébet started her job as Open Science Officer at DARIAH-EU in March 2018. She studied comparative literature studies, cultural studies and linguistics at the University of Szeged (Hungary) as well as at the Eötvös Loránd University in Budapest. She received her PhD in Cultural Linguistics summa cum laude in 2018 at Eötvös Loránd University for her corpus study of the evolution of certain Hungarian word-formation schemata. After graduating, she taught linguistics and Hungarian as foreign language at the Eötvös Loránd University as well as at the Fachhochschule Burgenland in Eisenstadt, Austria. In 2016, her commitment to democracy in science led her to join ScienceOpen, a research evaluation and discovery platform, as Content Integration Manager and open science advocate. Her role at DARIAH enables her to merge the 2 main pillars of her professional life: humanities and open science. She is responsible for fostering and implementing open science practices across DARIAH and its cooperating partners and contributes to the design and implementation of open science policy statements, guidelines and service related to the open dissemination of research results in the humanities

Data and Software citation practices and PIDs


This session will provide an overview of persistent identifiers, outline their importance and also provide an overview of how to cite data and software. During this highly interactive session we will both overview the why and how of data and software citation and discuss issues that can be encountered specifically in a humanities context.

Frances Madden - Data and Software Citation Practices and PIDs

Speakerfor this session

  • Frances Madden

    Frances Madden is Research Identifiers Lead at The British Library, overseeing the British Library’s contribution to the FREYA project. Her role includes looking at integrating persistent identifiers into the Library’s systems and representing the humanities and social sciences sectors within FREYA. Prior to joining the British Library, Frances worked as a research data manager and an archivist at King’s College London and Royal Holloway, University of London.

Open Research Notebooks


This session provides an introduction to Jupyter notebooks and their potentials for well-documented, reproducible and reusable Digital Humanities outputs and workflows. More specifically, it covers the following topics:

  • The history of research notebooks
  • Environments
  • Hosted vs Local
  • The Python ecosystem
  • Data analysis with Pandas
  • Text analysis with SpaCy
  • Visualization with Seaborn and Matplotlib
  • Examples for the application of Jupiter notebooks in Digital Humanities research projects.
Javier de la Rosa - Open Research Notebooks

Speakerfor this session

  • Javier de la Rosa

    Javier de la Rosa is a Postdoctoral Researcher at UNED working on Natural Language Processing, where he is part of the POSTDATA Project. He holds a PhD in Hispanic Studies with a specialization in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville. Javier has previously worked as a Research Engineer at Stanford University, and as the Technical Lead at the University of Western Ontario CulturePlex Lab. His interests range from NLP applied to historical texts to the analysis of networks of Fine Arts artifacts and the visual culture of the past.

Copyright and (Open) Licensing


As researchers, we are both creators of intellectual works and users of others’ works. Copyright addresses the proper balance between the interests of creators and the possibility of reuse by the public. We will therefore look at principles of copyright and statutory licenses for research and education and investigate the provisions of the recent EU Directive on Copyright in the Digital Single Market. In addition, we will discuss how Open licenses like Creative Commons work in theory and practice and how we can employ them to ensure both widespread re-use of our intellectual outputs and proper attribution.

Walter Scholger - Copyright and (Open) licensing

Speakerfor this session

  • Walter Scholger

    Walter studied History and Applied Cultural Sciences in Graz (Austria) and Maynooth (Ireland). He served as the Deputy Director of the Center for Information Modeling – Austrian Centre for Digital Humanities at the University of Graz (Austria) from 2008 to 2019, and continues to deal with administrative issues, project management and the coordination of the Centre’s teaching activities. He is involved in several international projects and member (and co-lead) of several working groups of DH umbrella organizations focusing on legal aspects of academia and digitisation (e.g. CLARIN Legal Issues Committee) and DH curricula development (e.g. ADHO Digital Pedagogy SIG), and has given numerous international workshops on IPR, licensing and data protection in recent years. A veteran contributor to DARIAH-EU, he was one of the driving forces behind the “DH Course Registry” and has been active in several Working Groups, serving as Co-Lead of the “Training and Education” Working Group from 2013 to 2017 and the Working Group on “Ethics and Legality in Digital Arts and Humanities” (ELDAH) since 2017. In January 2019, he was appointed Deputy National Coordinator for CLARIAH Austria.

Data Management Plans


In this workshop we will underline the importance and the role that data management plans play in good data management. A Data Management Plan (DMP) is a formal document that specifies how research data will be handled both during and after a research project, describing what data will be collected / generated, the methodology and standards followed, whether and how this data will be shared and/or made open, and how it will be curated and preserved. DMPs are living documents, updated when needed throughout the research process. More and more research funders require a DMP as a deliverable, but there are several reasons why DMPs are part of good research practices.

Learning objectives:

  • Understand the importance of research data management;
  • Discover how a Data Management Plan (DMP) can help you be more efficient in your research;
  • Be aware of the European Comission’s requirements on research data;
  • Be able to start your own research data management plan.

In the second part of the workshop the participants will get to know tools and will start their own data management plan.

Antónia Correia - Data Management Plans

Speakerfor this session

  • Antonia Correia

    Antónia Correia works as an open science project officer for FOSTER Plus, FIT4RRI and ON-MERRIT at Minho University. She has extensive experience working in academic libraries and supporting researchers in scientific publishing, visibility and evaluation. She collaborates with Universidade Nova de Lisboa’s Doctoral School in the Information Literacy and Research Data Management courses. Working for FOSTER Plus project, she coordinated the Portuguese translation of the Open Science Training Handbook and collaborated in the Open Science Training Toolkit. She’s part of the OpenAIRE’s Community of Practice for training coordinators and managers and Research Data Alliance’s Portuguese Node. Research interests are all subjects related to Open Science, Research Data Management, Scholarly Publication and Research Integrity and Assessment.

Innovative publishing practices in the arts and humanities


This session aims to approach challenges and innovative models related to multilingualism within bibliodiversity in Social Sciences and Humanities (SSH). The role of language in research practices tends to be considered secondary in STEM disciplines (Science, Technology, Engineering and Mathematics), as there seems to be a tacit assumption that English is widely accepted as the language of communication. English tends to be promoted in (inter)national and European research and innovation policies – mainly written in English and with scarce reference to language use or multilingualism. In this context, SSH specific needs regarding scholarly communication in native languages has to be addressed: in those disciplines where language and concepts are very often not only means of communication but objects of research themselves, the use of mother tongue is indispensable for in-depth understanding, and knowledge co-creation and sharing. In this setting, the challenge of multilingualism should sit alongside the concept of ‘bibliodiversity’, coined by the International Alliance of Independent Publishers, which refers to “cultural diversity applied to the world of books”; that is, underlining the need to encompass a diversity of languages, scientific areas, publication formats, and actors. There are firm grounds to state that bibliodiversity, through multilingual publishing, is an efficient way of protecting national languages and enhancing different academic rhetorical traditions, by reaching specialists and wider audiences in a complementary way. Therefore, it is of the utmost relevance to understand how bibliodiversity, in its manifold formats and multilingual forms, is promoted through innovative practices and high-level programmatic involvement. In order to illustrate this, a presentation will be made of the OPERAS consortium at large, as well as of the more particular scope of the recently EU funded project TRIPLE. Finally, a practical approach will be made taking as reference the UC Digitalis ecosystem, based on the experience of Coimbra University Press.

Delfim Leão - Innovative Publishing Practices in the Arts and Humanities

Speakerfor this session

  • Delfim Leão

    Delfim Leão is Full Professor at the Institute of Classical Studies and currently Vice-Rector for Culture and Open Science at the University of Coimbra. He is Director of Coimbra University Press (since 2011) and has formerly been President of the Portuguese Association of Higher Education Presses (2011-2014). He is the Portuguese representative (since 2017) at the OPERAS consortium core group and participates at the recently funded EU projects TRIPLE (Targeting Researchers through Innovative Practices and multiLingual Exploration, 2019-2022) and OPERAS-P (Preparing Open Access in the European Research Area through Scholarly Communication, 2019-2021). He is particularly active in the area of Multilingualism. His scientific and professional activities include the development of two specialized digital platforms: the Classica Digitalia and the UC Digitalis.