Skip to main content

Finding Places in Text with the World Historical Gazetteer

Researchers often need to be able to search a corpus of texts for a defined list of terms. In many cases, historians are interested in certain places named in a text or texts. This lesson details how to programmatically search documents for a list of terms, including place names. To begin, we will produce a tab-separated value (TSV) file where each row gives the matched term and the term’s location in the text. We also generate a visualisation that can be used to interpret the matches in context and to assess their usefulness for a given project. The goal of the lesson is to systematically search a text corpus for place names and then to use a service to locate and map historic place names.

This lesson will be useful for anyone wishing to perform named entity recognition (NER) on a text corpus. Other users may wish to skip the text extraction portion of this lesson and focus solely on the spatial elements of the lesson, that is gazetteer building and using the World Historical Gazetteer (WHG). These spatial steps are especially useful for someone looking to create maps depicting historical information in a largely point and click interface. We have designed this lesson to show how to combine text analysis with mapping, but understand that some readers may only be interested in one of these two methodologies. We urge you to try both parts of the lesson together if you have time, as this will enable you to learn how text analysis and mapping can be combined in one project. Additionally, it will demonstrate how the results of these two activities can be ported into another form of digital analysis.

Learning Outcomes

After completing this lesson, you will be able to:

  • Programmatically search documents for a list of terms, including place names
  • Produce a tab-separated value (TSV) file where each row gives the matched term and the term’s location in the text
  • Generate a visualisation that can be used to interpret the matches in context
  • Combine text analysis with mapping
Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Cite as

Susan Grunewald, Andrew Janco, Eleni Gadolou and Randa El Khatib (2022). Finding Places in Text with the World Historical Gazetteer. Version 1.0.0. Edited by Anna-Maria Sichani. ProgHist Ltd.. [Training module]. https://doi.org/10.46430/phen0096

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Finding Places in Text with the World Historical Gazetteer
Authors:
Susan Grunewald, Andrew Janco
Domain:
Social Sciences and Humanities
Language:
en
Published:
12/19/2023
Content type:
Training module
Licence:
CCBY 4.0
Sources:
Programming Historian
Topics:
Open access, Open education, Natural Language Processing, Spatial humanities
Version:
1.0.0