ExploreCor - Using Programmable Corpora in Computational Literary Studies

ExploreCor - Using Programmable Corpora in Computational Literary Studies

Location

Vienna, Austria

Date

10–12 June 2024

Authors

and 17 more

Topics

About the Training School

The academic training school, “ExploreCor: Using Programmable Corpora in Computational Literary Studies,” took place in Vienna in June 2024, spanning three intensive days. The unique programme was designed to provide participants with a comprehensive understanding of the research cycle in Computational Literary Studies, equipping them with the skills needed to navigate the evolving landscape of digital humanities.

The training school began by delving into the critical process of finding and evaluating corpora of literary texts. Participants explored the concept of Programmable Corpora, a pivotal aspect of the curriculum. Programmable Corpora are dynamic collections of literary works that can be manipulated programmatically, allowing for customised and nuanced analyses.

The curriculum progressed to the formulation of research questions and the subsequent execution of analyses using Python and Jupyter Notebooks. Attendees utilised the tool DraCor, designed for efficient and flexible literary text analysis. Additionally, the training incorporated the CLSCor catalogue, a Linked Data-powered resource developed within the CLS INFRA project, enabling students to explore and select corpora for their research.

An integral component of the training focused on APIs and Linked Data, emphasising the interconnected nature of literary datasets. Students engaged in exemplary research projects using the DraCor system, gaining hands-on experience in navigating digital literary networks. The training school addressed the issue of Repeatable Research, a challenge in the Digital Humanities landscape, and explored methods to ensure research replicability.

Digital Literary Network Analysis is a key topic covered in the programme, providing participants with the tools to uncover intricate relationships within literary texts. A dedicated segment addressed the Reproducibility Crisis in Digital Humanities, underscoring the importance of transparent and replicable research practices.

To ensure the longevity and accessibility of research outcomes, the training school introduced Docker as a valuable tool. Participants learnt how to leverage Docker to encapsulate their research environments, enhancing the reproducibility of their findings.

By combining theoretical foundations with practical applications, participants left equipped to navigate the complexities of programmable corpora, digital literary analysis, and reproducible research, contributing to the ongoing advancement of the field.

Credits

This training event was organised with the considerable effort of many people, many of whom were playing multiple roles. The list of people by role is shown in these tabs.

Preparatory Information

Software Downloads

The following software was used during more practical aspects of this workshop. We invite you to download the following software if you wish to use this learning resource as a practical guide to certain methods and techniques.

Gephi

Download and install the latest version of “Gephi” (https://gephi.org)

Docker

“Docker Desktop”: (https://www.docker.com/products/docker-desktop/)

After having installed “Docker”, go to https://github.com/dracor-org/dracor-explorecor and follow the instructions in the “Setup of a Local DraCor Environment” section. At the end of this process, there should be a local Jupyter Lab instance running under http://localhost:8889

1.Introduction to Programmable Corpora
For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this introduction to the “ExploreCor” Training School, we will firstly reflect on working with literary corpora in Computational Literary Studies.
Speaker
Peer Trilcke
Peer Trilcke is a Professor of modern German literature at the University of Potsdam since 2016; since 2017 he has been Director of the Theodor Fontane Archive, an institution of the University of Potsdam; since 2018 head of the Network for Digital Humanities at the University of Potsdam. He is a member of the working groups “Scientific practice” and “Digital collection” of the “Digital Information” Initiative by the Alliance of Science Organizations in Germany. His work focuses on the research-based development of infrastructures for literary corpora and the quantitative analysis of literary texts. Peer is one of the editors of the multinlingual DraCor-service (Drama Corpora Platform) and of the Journal of Computational Literary Studies (JCLS).
Link
'Intro to Programmable Corpora: Corpora as Research Objects in Literary Studies' presentation slides.
2.Introducing DraCor
For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this introduction to the “ExploreCor” Training School, we present DraCor, the Drama Corpora Project, and its digital ecosystem as a prototype of “Programmable Corpora”.
Speaker
Frank Fischer
Frank Fischer is Professor of Digital Humanities at Freie Universität Berlin. He holds a Master’s Degree in Computer Science and German Studies from Leipzig University and received his PhD from the University of Jena with a study on revenge drama in the Enlightenment. From 2017 to 2021 he was director of DARIAH-EU, the pan-European digital research infrastructure for the Arts and Humanities. He is founder and editor-in-chief of DraCor (https://dracor.org/), a multilingual platform dedicated to digital research on European drama.
Link
'Introduction to Programmable Corpora: Introducing DraCor' presentation slides
3.Using DraCor: Four Showcases
For Computational Literary Studies, one research object has proven to be of particular relevance that hardly plays a role in traditional literary studies: the corpus. In this part of the introduction to the “ExploreCor” Training School we will demonstrate how to use the provided dockerised DraCor research environment and the bundled Jupyter Lab instance to do research with the DraCor API.
Speaker
Ingo Börner
Ingo Börner studied Russian and German Philology at the University of Vienna. He worked as a teaching and research associate at the Department for German Studies of the University of Vienna and the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences. He has been involved in the development of the DraCor platform focusing mainly on data modelling and schema development. In July 2021 he joined the University of Potsdam, Germany as a research associate in the CLS INFRA project to continue the work on DraCor and explore the potential of “Programmable Corpora” for literary studies. His research interests are in the field of computational literary studies, digital editions and semantic web technologies.
Link
'Introduction to Programmable Corpora: Using DraCor – Four Showcases' presentation slides
4.Introduction to Linked Open Data for Beginners
Linked Open Data (LOD) refers to datasets that are publicly available, can be linked to other datasets, and can be interpreted and used not only by humans but also by machines. This presentation outlines the main principles of LOD and its underlying framework, the Semantic Web. Technical aspects are also covered, including how to formalise LOD using the Resource Description Framework (RDF), ontologies, and controlled vocabularies; how to express it using a human-readable syntax (Turtle); and how to search through it using the SPARQL query language.
Speaker
Massimiliano Carloni
Massimiliano Carloni is a data modeller and repository manager at the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH). His main interests lie in long-term digital preservation and semantic technologies, with a special focus on graph-based data models and Linked Open Data. He is part of the managing team behind the archive for digital research data ARCHE and has contributed to the creation of the new library catalogue of the Austrian Academy of Sciences. He is currently working in the project ATRIUM, which aims to facilitate digital methods and improve data and service interoperability in the Arts and Humanities. He is the main responsible for the Vocabs service at ACDH-CH and DARIAH-EU.
Link
'Linked Open Data: An Introduction' presentation slides
5.Exploring Programmable Corpora 1: A Case Study in Conducting Research with the DraCor API
A key component of Programmable Corpora is the research-driven API, which makes it particularly easy to retrieve and process corpus data that have been generated for specific research questions. The DraCor API was developed especially for the network analysis method. This session introduces the method of network analysis and presents the DraCor API and its possibilities in detail.
Speaker
Peer Trilcke
Peer Trilcke is a Professor of modern German literature at the University of Potsdam since 2016; since 2017 he has been Director of the Theodor Fontane Archive, an institution of the University of Potsdam; since 2018 head of the Network for Digital Humanities at the University of Potsdam. He is a member of the working groups “Scientific practice” and “Digital collection” of the “Digital Information” Initiative by the Alliance of Science Organizations in Germany. His work focuses on the research-based development of infrastructures for literary corpora and the quantitative analysis of literary texts. Peer is one of the editors of the multinlingual DraCor-service (Drama Corpora Platform) and of the Journal of Computational Literary Studies (JCLS).
Link
'Exploring Programmable Corpora 1: A Case Study in Conducting Research with the DraCor API' presentation slides.
6.Exploring Programmable Corpora 2: Introducing Network Analysis
Continuing to demonstrate how DraCor can be used for network analysis, the first part of this session is an introduction to dramatic network analysis. It outlines the theoretical background of dramatic network analysis. Second, it presents a range of network measures relevant to the analysis of networks in dramatic literature. Finally, it discusses the potential of dramatic network analysis in the field of literary studies.
Speaker
Julia Jennifer Beine
Julia Jennifer Beine is an interdisciplinary researcher in the fields of Classical Philology, General and Comparative Literature, and Digital Humanities. She received her PhD in Latin Philology from the Ruhr University Bochum, doing research on the scheming slave (“servus callidus”) as a central figure in ancient and early modern European comedy. She is co-editor of the dracor.org platform and the incorporated digital corpora RomDraCor, GreekDraCor, and NeoLatDraCor.
Link
'Exploring Programmable Corpora: Introducing Network Analysis' presentation slides.
7.Reproducibility 1: Replication or Prediction or What?
As a result of the so-called “reproducibility crisis” making research repeatable has become a crucial topic in empirical and technical sciences. In Computational Literary Studies (CLS), there is still a shortage of both a culture of repetitive research and user-friendly technical solutions. In this talk Christof Schöch introduces a theoretical framework to describe modes of repeating research in Digital Humanities.
Speaker
Christof Schöch
Christof Schöch is Professor of Digital Humanities at the University of Trier, Germany, and Co-Director of the Trier Center for Digital Humanities. He is also chair of the COST Action Distant Reading for European Literary History and president of the Digital Humanities Association for the German-speaking area (DHd). Christof studied Romance languages, English and Psychology in Freiburg and Tours. His master’s thesis was on French contemporary writer François Bon. In 2008, he obtained his PhD in French Literature with a study of ‘La Description double dans le roman des Lumières 1760-1800’ (Kassel / Paris IV-Sorbonne). The thesis has been awarded the Prix Germaine de Stael 2010 and published with Classiques Garnier. From 2004 to 2011, he has been a research assistant at the Institute of Romance Languages and Literatures at Kassel University. From 2011 to 2017, he has been a research associate at the Department for Literary Computing at University of Würzburg, first as a researcher in the DARIAH-DE (Digital Research Infrastructure for the Arts and Humanities) inititative, then as leader of the Computational Literary Genre Stylistics group. In 2017, he received the offer to join Trier University as Full Professor of Digital Humanities. Christof’s interests in research and teaching are located at the confluence of French literary studies and Digital Humanities. His methodological focus is on Computational Literary Studies (quantitative methods of text analysis, building of digital textual resources, legal aspects). In terms of materials, he is focusing on French Classical and Enlightenment drama as well as on the modern and contemporary French novel. He is also interested in new forms of scholarly publishing and collaboration and pleads for Open Access / Open Science in the Humanities. He is an active member of the Romance Studies and Digital Humanities communities. Further information: https://christof-schoech.de/en/
Link
Presentation Slides "Replication or Prediction or What?"
8.Reproducibility 2: Reproducible Research with DraCor
This session demonstrates how to use the available Docker images of DraCor and GitHub to setup stable local DraCor corpora to allow for replication of research. After viewing this session, learners should be able to create their own custom corpora and learn about strategies for sharing them.
Speaker
Ingo Börner
Ingo Börner studied Russian and German Philology at the University of Vienna. He worked as a teaching and research associate at the Department for German Studies of the University of Vienna and the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences. He has been involved in the development of the DraCor platform focusing mainly on data modelling and schema development. In July 2021 he joined the University of Potsdam, Germany as a research associate in the CLS INFRA project to continue the work on DraCor and explore the potential of “Programmable Corpora” for literary studies. His research interests are in the field of computational literary studies, digital editions and semantic web technologies.
Link
'Reproducibility in CLS Research: Reproducible Research with DraCor' presentation slides

ExploreCor - Using Programmable Corpora in Computational Literary Studies

About the Training School

Credits

Preparatory Information

Software Downloads

Gephi

Docker

Further Reading

1.Introduction to Programmable Corpora

2.Introducing DraCor

3.Using DraCor: Four Showcases

4.Introduction to Linked Open Data for Beginners

5.Exploring Programmable Corpora 1: A Case Study in Conducting Research with the DraCor API

6.Exploring Programmable Corpora 2: Introducing Network Analysis

7.Reproducibility 1: Replication or Prediction or What?

8.Reproducibility 2: Reproducible Research with DraCor

Cite as

Reuse conditions

Full metadata