Skip to main content

Interrogating a National Narrative with GPT-2

This lesson is intended to teach you how to apply Generative Pre-trained Transformer 2 (GPT-2), one of the largest existing open-source language models, to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora, aiding in the task of locating the broader themes and trends that emerge from within your body of work. This method of analysis is useful for historical inquiry as it allows for a narrative crafted over years and thousands of texts to be aggregated and condensed, then analyzed through direct inquiry. In essence, it allows you to “talk” to your sources.

To do this, we will use an implementation of GPT-2 that is wrapped in a Python package to simplify the finetuning of an existing machine learning model. Although the code itself in this tutorial is not complex, in the process of learning this method for exploratory data analysis you will gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning. Beyond just interrogating history, we will also interrogate the ethics of producing this form of research, from its greater impact on the environment to how even one passage from the text generated can be misinterpreted and recontextualized.

Learning outcomes

After completing this lesson, you will be able to:

  • Apply GPT-2 to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora
  • Gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning
  • Understand the ethical complications of producing this form of research
Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Cite as

Chantal Brousseau, Katherine McDonough and Lorella Viola (2022). Interrogating a National Narrative with GPT-2. Version 1.0.0. Edited by John R Ladd and Tiago Sousa Garcia. ProgHist Ltd. [Training module]. https://doi.org/10.46430/phen0104

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Interrogating a National Narrative with GPT-2
Authors:
Chantal Brousseau
Domain:
Social Sciences and Humanities
Language:
en
Published:
4/12/2024
Content type:
Training module
Licence:
CCBY 4.0
Sources:
Programming Historian
Topics:
Python, Machine Learning, Artificial Intelligence
Version:
1.0.0