Skip to main content
Home

ExploreCor - Using Programmable Corpora in Computational Literary Studies

About the Training School

The academic training school, “ExploreCor: Using Programmable Corpora in Computational Literary Studies,” took place in Vienna in June 2024, spanning three intensive days. The unique programme was designed to provide participants with a comprehensive understanding of the research cycle in Computational Literary Studies, equipping them with the skills needed to navigate the evolving landscape of digital humanities.

The training school began by delving into the critical process of finding and evaluating corpora of literary texts. Participants explored the concept of Programmable Corpora, a pivotal aspect of the curriculum. Programmable Corpora are dynamic collections of literary works that can be manipulated programmatically, allowing for customised and nuanced analyses.

The curriculum progressed to the formulation of research questions and the subsequent execution of analyses using Python and Jupyter Notebooks. Attendees utilised the tool DraCor, designed for efficient and flexible literary text analysis. Additionally, the training incorporated the CLSCor catalogue, a Linked Data-powered resource developed within the CLS INFRA project, enabling students to explore and select corpora for their research.

An integral component of the training focused on APIs and Linked Data, emphasising the interconnected nature of literary datasets. Students engaged in exemplary research projects using the DraCor system, gaining hands-on experience in navigating digital literary networks. The training school addressed the issue of Repeatable Research, a challenge in the Digital Humanities landscape, and explored methods to ensure research replicability.

Digital Literary Network Analysis is a key topic covered in the programme, providing participants with the tools to uncover intricate relationships within literary texts. A dedicated segment addressed the Reproducibility Crisis in Digital Humanities, underscoring the importance of transparent and replicable research practices.

To ensure the longevity and accessibility of research outcomes, the training school introduced Docker as a valuable tool. Participants learnt how to leverage Docker to encapsulate their research environments, enhancing the reproducibility of their findings.

By combining theoretical foundations with practical applications, participants left equipped to navigate the complexities of programmable corpora, digital literary analysis, and reproducible research, contributing to the ongoing advancement of the field.

Credits

This training event was organised with the considerable effort of many people, many of whom were playing multiple roles. The list of people by role is shown in these tabs.

Preparatory Information

Software Downloads

The following software was used during more practical aspects of this workshop. We invite you to download the following software if you wish to use this learning resource as a practical guide to certain methods and techniques.

Gephi

Download and install the latest version of “Gephi” (https://gephi.org)

Docker

“Docker Desktop”: (https://www.docker.com/products/docker-desktop/)

After having installed “Docker”, go to https://github.com/dracor-org/dracor-explorecor and follow the instructions in the “Setup of a Local DraCor Environment” section. At the end of this process, there should be a local Jupyter Lab instance running under http://localhost:8889

Further Reading

Börner, I., & Trilcke, P. (2023). CLS INFRA D7.1 On Programmable Corpora (v1.0.0). Zenodo. (https://doi.org/10.5281/zenodo.7664964)

Börner, I., & Trilcke, P. (2024). CLS INFRA D7.3 On Versioning Living and Programmable Corpora (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.11081934

Ďurčo, M., Charvat, V. M., Börner, I., Mrugalski, M., & Odebrecht, C. (2022). CLS INFRA D6.1 Inventory of existing data sources and formats. Zenodo. (https://doi.org/10.5281/zenodo.7520287)

Ďurčo, M., Charvát, V. M., & Resch, S. (2025). CLS INFRA D6.2 Transformation toolbox & ingest and processing workflow. Zenodo. https://doi.org/10.5281/zenodo.14998374

Mrugalski, M., Odebrecht, C., Charvat, V., Börner, I., & Durco, M. (2022). CLS INFRA D5.1. Review of the Data Landscape. Zenodo. (https://doi.org/10.5281/zenodo.6861022)

Schöch, C. (2023) Repetitive research: a conceptual space and terminology of replication, reproduction, revision, reanalysis, reinvestigation and reuse in digital humanities. _Int J Digit Humanities_ 5, 373–403 . (https://doi.org/10.1007/s42803-023-00073-y)


  1. 1.Introduction to Programmable Corpora

    For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this introduction to the “ExploreCor” Training School, we will firstly reflect on working with literary corpora in Computational Literary Studies.

    Speaker
    • Peer Trilcke

      Peer Trilcke is a Professor of modern German literature at the University of Potsdam since 2016; since 2017 he has been Director of the Theodor Fontane Archive, an institution of the University of Potsdam; since 2018 head of the Network for Digital Humanities at the University of Potsdam. He is a member of the working groups “Scientific practice” and “Digital collection” of the “Digital Information” Initiative by the Alliance of Science Organizations in Germany. His work focuses on the research-based development of infrastructures for literary corpora and the quantitative analysis of literary texts. Peer is one of the editors of the multinlingual DraCor-service (Drama Corpora Platform) and of the Journal of Computational Literary Studies (JCLS).

  2. 2.Introducing DraCor

    For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this introduction to the “ExploreCor” Training School, we present DraCor, the Drama Corpora Project, and its digital ecosystem as a prototype of “Programmable Corpora”.

    Speaker
    • Frank Fischer

      Frank Fischer is Professor of Digital Humanities at Freie Universität Berlin. He holds a Master’s Degree in Computer Science and German Studies from Leipzig University and received his PhD from the University of Jena with a study on revenge drama in the Enlightenment. From 2017 to 2021 he was director of DARIAH-EU, the pan-European digital research infrastructure for the Arts and Humanities. He is founder and editor-in-chief of DraCor (https://dracor.org/), a multilingual platform dedicated to digital research on European drama.

  3. 3.Using DraCor: Four Showcases

    For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this part of the introduction to the “ExploreCor” Training School we will demonstrate how to use the provided dockerised DraCor research environment and the bundled Jupyter Lab instance to do research with the DraCor API.

    Speaker
    • Ingo Börner

      Ingo Börner studied Russian and German Philology at the University of Vienna. He worked as a teaching and research associate at the Department for German Studies of the University of Vienna and the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences. He has been involved in the development of the DraCor platform focusing mainly on data modelling and schema development. In July 2021 he joined the University of Potsdam, Germany as a research associate in the CLS INFRA project to continue the work on DraCor and explore the potential of “Programmable Corpora” for literary studies. His research interests are in the field of computational literary studies, digital editions and semantic web technologies.

  4. 4.Introduction to Linked Open Data for Beginners

    Linked Open Data (LOD) refers to datasets that are publicly available, can be linked to other datasets, and can be interpreted and used not only by humans but also by machines. This presentation outlines the main principles of LOD and its underlying framework, the Semantic Web. Technical aspects are also covered, including how to formalise LOD using the Resource Description Framework (RDF), ontologies, and controlled vocabularies; how to express it using a human-readable syntax (Turtle); and how to search through it using the SPARQL query language.

    Speaker
    • Massimiliano Carloni

      Massimiliano Carloni is a data modeller and repository manager at the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH). His main interests lie in long-term digital preservation and semantic technologies, with a special focus on graph-based data models and Linked Open Data. He is part of the managing team behind the archive for digital research data ARCHE and has contributed to the creation of the new library catalogue of the Austrian Academy of Sciences. He is currently working in the project ATRIUM, which aims to facilitate digital methods and improve data and service interoperability in the Arts and Humanities. He is the main responsible for the Vocabs service at ACDH-CH and DARIAH-EU.

  5. 5.Exploring Programmable Corpora 1: A Case Study in Conducting Research with the DraCor API

    A key component of Programmable Corpora is the research-driven API, which makes it particularly easy to retrieve and process corpus data that have been generated for specific research questions. The DraCor API was developed especially for the network analysis method. This session introduces the method of network analysis and presents the DraCor API and its possibilities in detail.

    Speaker
    • Peer Trilcke

      Peer Trilcke is a Professor of modern German literature at the University of Potsdam since 2016; since 2017 he has been Director of the Theodor Fontane Archive, an institution of the University of Potsdam; since 2018 head of the Network for Digital Humanities at the University of Potsdam. He is a member of the working groups “Scientific practice” and “Digital collection” of the “Digital Information” Initiative by the Alliance of Science Organizations in Germany. His work focuses on the research-based development of infrastructures for literary corpora and the quantitative analysis of literary texts. Peer is one of the editors of the multinlingual DraCor-service (Drama Corpora Platform) and of the Journal of Computational Literary Studies (JCLS).

  6. 6.Exploring Programmable Corpora 2: Introducing Network Analysis

    Continuing to demonstrate how DraCor can be used for network analysis, the first part of this session is an introduction to dramatic network analysis. It outlines the theoretical background of dramatic network analysis. Second, it presents a range of network measures relevant to the analysis of networks in dramatic literature. Finally, it discusses the potential of dramatic network analysis in the field of literary studies.

    Speaker
    • Julia Jennifer Beine

      Julia Jennifer Beine is an interdisciplinary researcher in the fields of Classical Philology, General and Comparative Literature, and Digital Humanities. She received her PhD in Latin Philology from the Ruhr University Bochum, doing research on the scheming slave (“servus callidus”) as a central figure in ancient and early modern European comedy. She is co-editor of the dracor.org platform and the incorporated digital corpora RomDraCor, GreekDraCor, and NeoLatDraCor.

  7. 7.Reproducibility 1: Replication or Prediction or What?

    As a result of the so-called “reproducibility crisis” making research repeatable has become a crucial topic in empirical and technical sciences. In Computational Literary Studies (CLS), there is still a shortage of both a culture of repetitive research and user-friendly technical solutions. In this talk Christof Schöch introduces a theoretical framework to describe modes of repeating research in Digital Humanities.

    Speaker
    • Christof Schöch

      Christof Schöch is Professor of Digital Humanities at the University of Trier, Germany, and Co-Director of the Trier Center for Digital Humanities. He is also chair of the COST Action Distant Reading for European Literary History and president of the Digital Humanities Association for the German-speaking area (DHd). Christof studied Romance languages, English and Psychology in Freiburg and Tours. His master’s thesis was on French contemporary writer François Bon. In 2008, he obtained his PhD in French Literature with a study of ‘La Description double dans le roman des Lumières 1760-1800’ (Kassel / Paris IV-Sorbonne). The thesis has been awarded the Prix Germaine de Stael 2010 and published with Classiques Garnier. From 2004 to 2011, he has been a research assistant at the Institute of Romance Languages and Literatures at Kassel University. From 2011 to 2017, he has been a research associate at the Department for Literary Computing at University of Würzburg, first as a researcher in the DARIAH-DE (Digital Research Infrastructure for the Arts and Humanities) inititative, then as leader of the Computational Literary Genre Stylistics group. In 2017, he received the offer to join Trier University as Full Professor of Digital Humanities. Christof’s interests in research and teaching are located at the confluence of French literary studies and Digital Humanities. His methodological focus is on Computational Literary Studies (quantitative methods of text analysis, building of digital textual resources, legal aspects). In terms of materials, he is focusing on French Classical and Enlightenment drama as well as on the modern and contemporary French novel. He is also interested in new forms of scholarly publishing and collaboration and pleads for Open Access / Open Science in the Humanities. He is an active member of the Romance Studies and Digital Humanities communities. Further information: https://christof-schoech.de/en/

  8. 8.Reproducibility 2: Reproducible Research with DraCor

    This session demonstrates how to use the available Docker images of DraCor and GitHub to setup stable local DraCor corpora to allow for replication of research. After viewing this session, learners should be able to create their own custom corpora and learn about strategies for sharing them.

    Speaker
    • Ingo Börner

      Ingo Börner studied Russian and German Philology at the University of Vienna. He worked as a teaching and research associate at the Department for German Studies of the University of Vienna and the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences. He has been involved in the development of the DraCor platform focusing mainly on data modelling and schema development. In July 2021 he joined the University of Potsdam, Germany as a research associate in the CLS INFRA project to continue the work on DraCor and explore the potential of “Programmable Corpora” for literary studies. His research interests are in the field of computational literary studies, digital editions and semantic web technologies.

Cite as

Julia Jennifer Beine, Ingo Börner, Floor Buschenhenke, Dîlan Canan Çakir, Massimiliano Carloni, Anna Dijkstra, Matej Ďurčo, Frank Fischer, Vicky Garnett, Sarah Hoover, Victor J. Illmer, Bartłomiej Kunda, Carsten Milling, Lukas Plank, Jonas Rohe, Christof Schöch, Justin Tonra, Peer Trilcke, Maria Wiederänders, Anna Woldrich and Katharina Wünsche (2025). ExploreCor - Using Programmable Corpora in Computational Literary Studies. Version 1.0.0. DARIAH Campus [Event]. https://hdl.handle.net/21.11159/0196241c-c361-76ce-a1b5-6fdd17d0d776

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter.

Full metadata

Title:
ExploreCor - Using Programmable Corpora in Computational Literary Studies
Authors:
Julia Jennifer Beine, Ingo Börner, Floor Buschenhenke, Dîlan Canan Çakir, Massimiliano Carloni, Anna Dijkstra, Matej Ďurčo, Frank Fischer, Vicky Garnett, Sarah Hoover, Victor J. Illmer, Bartłomiej Kunda, Carsten Milling, Lukas Plank, Jonas Rohe, Christof Schöch, Justin Tonra, Peer Trilcke, Maria Wiederänders, Anna Woldrich, Katharina Wünsche
Domain:
Social Sciences and Humanities
Language:
English
Published to DARIAH-Campus:
09/04/2025
Content type:
Event
License:
CC BY 4.0
Sources:
DARIAH
Topics:
Linked Open Data, Python, Semantic Web, Corpus Analysis
Version:
1.0.0