Programming Historian

Programming Historian offers novice-friendly, peer-reviewed lessons that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate research and teaching.

Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza
EN
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text. You'll learn how to use the Python packages NLTK, spaCy, and Stanza to analyze a multilingual Russian and French text.
Authors
Ian Goodale
Clustering and Visualising Documents Using Word Embeddings
EN
This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.
Authors
Jonathan Reades
Jennie Williams
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1)
EN
This is the first of a two-part lesson introducing deep learning based computer vision methods for humanities research. Using a dataset of historical newspaper advertisements and the fastai Python library, the lesson walks through the pipeline of training a computer vision model to perform image classification.
Authors
Daniel van Strien
Kaspar Beelen
Melvin Wevers
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 2)
EN
This is the second of a two-part lesson introducing deep learning based computer vision methods for humanities research. This lesson digs deeper into the details of training a deep learning based computer vision model. It covers some challenges one may face due to the training data used and the importance of choosing an appropriate metric for your model. It presents some methods for evaluating the performance of a model.
Authors
Daniel van Strien
Kaspar Beelen
Melvin Wevers
Corpus Analysis with spaCy
EN
This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
Authors
Megan S. Kane
Creating Deep Convolutional Neural Networks for Image Classification
EN
This lesson provides a beginner-friendly introduction to convolutional neural networks (CNNs) for image classification. The tutorial provides a conceptual understanding of how neural networks work by using Google's Teachable Machine to train a model on paintings from the ArtUK database. This lesson also demonstrates how to use Javascript to embed the model in a live website.
Authors
Nabeel Siddiqui
Creating GUIs in Python for Digital Humanities Projects
EN
In this lesson, you will use Qt Designer and Python to design and implement a simple graphical user interface and application to merge PDF files. This lesson also demonstrates how to package the application for distribution to other personal computers.
Authors
Christopher Goodwin
Creating Interactive Visualizations with Plotly
EN
This lesson demonstrates how to create interactive data visualizations in Python with Plotly's open-source graphing libraries using materials from the Historical Violence Database.
Authors
Grace Di Méo
Displaying a Georeferenced Map in KnightLab's StoryMap JS
EN
In this lesson from Programming Historian, you will learn how to display a georeferenced map from Map Warper in KnightLab's StoryMap JS, an interactive web-based map and storytelling platform.
Authors
Erica Y Hayes
Mia Partlow
Facial Recognition in Historical Photographs with Artificial Intelligence in Python
EN
In this lesson, you'll learn computer vision and machine learning principles for object recognition, and how to apply these principles using Python to recognize and classify smiling faces in historical photographs.
Authors
Charles Goldberg
Zach Haala
Finding Places in Text with the World Historical Gazetteer
EN
Researchers often need to be able to search a corpus of texts for a defined list of terms and historians are often interested in certain places named in a text or texts. This lesson details how to programmatically search documents for a list of terms, including place names and then how to obtain coordinates and map historical place names with the World Historical Gazetteer.
Authors
Susan Grunewald
Andrew Janco
Interrogating a National Narrative with GPT-2
EN
In this lesson, you will learn how to apply a Generative Pre-trained Transformer language model to a large-scale corpus so that you can locate broad themes and trends within written text.
Authors
Chantal Brousseau
Introduction to Map Warper
EN
This lesson from Programming Historian introduces basic use of Map Warper for historical maps. It guides you from upload to export, demonstrating methods for georeferencing and producing visualizations.
Authors
Anthony Picón Rodríguez
Miguel Cuadros
Making an Interactive Web Application with R and Shiny
EN
This lesson demonstrates how to build a basic interactive web application using Shiny, a library (a set of additional functions) for the programming language R. In the lesson, you will design and implement a simple application, consisting of a slider which allows a user to select a date range, which will then trigger some code in R, and display a set of corresponding points on an interactive map.
Authors
Yann Ryan
Regression Analysis with Scikit-Learn (part 1 - Linear)
EN
This lesson is the first of a two-part lesson focusing on an indispensable set of data analysis methods, logistic and linear regression. It provides an overview of linear regression and walks through running both algorithms in Python (using Scikit-learn). The lesson also discusses interpreting the results of a regression model and some common pitfalls to avoid.
Authors
Matthew J Lavin
Regression Analysis with Scikit-learn (part 2 - Logistic)
EN
This lesson is the second in a two-part lesson focusing on regression analysis. It provides an overview of logistic regression, how to use Python (Scikit-learn) to make a logistic regression model, and a discussion of interpreting the results of such analysis.
Authors
Matthew J Lavin
Scalable Reading of Structured Data
EN
In this lesson, you will be introduced to 'scalable reading' and how to apply this workflow to your analysis of structured data.
Authors
Max Odsbjerg Pedersen
Josephine Møller Jensen
Victor Harbo Johnston
Sentiment Analysis with 'syuzhet' using R
EN
This lesson teaches you how to obtain and analyse narrative texts for patterns of sentiment and emotion. The 'syuzhet' sentiment analysis algorithm, along with the programming language R, will be used, demonstrating the techniques to allow learners to follow along.
Authors
Jennifer Isasi
Text Mining YouTube Comment Data with Wordfish in R
EN
In this lesson, you will learn how to download YouTube video comments and use the R programming language to analyze the dataset with Wordfish, an algorithm designed to identify opposing ideological perspectives within a corpus.
Authors
Alex Wermer-Colan
Nicole Lemire Garlic
Jeff Antsen
Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision
EN
Tools for machine transcription of handwriting are practical and labour-saving if you need to analyse or present text in digital form. This lesson will explain how to write a Python program to transcribe handwritten documents using Microsoft's Azure Cognitive Services, a commercially available service that has a cost-free option for low volumes of use.
Authors
Jeff Blackadar
Understanding and Creating Word Embeddings
EN
Word embeddings allow you to analyze the usage of different terms in a corpus of texts by capturing information about their contextual usage. Through a primarily theoretical lens, this lesson will teach you how to prepare a corpus and train a word embedding model. You will explore how word vectors work, how to interpret them, and how to answer humanities research questions using them.
Authors
Avery Blankenship
Sarah Connell
Quinn Dombrowski
Working with Named Places: How and Why to Build a Gazetteer
EN
A digital gazetteer records information associated with specific places. This lesson teaches you how to create a gazetteer from a historical text, using the Linked Places Delimited (LP-TSV) format.
Authors
Susan Grunewald
Ruth Mostern

Programming Historian

Resources