Skip to main content

Scalable Reading of Structured Data

In this lesson, we introduce a workflow for scalable reading of structured data, combining close interpretation of individual data points and statistical analysis of the entire dataset. The lesson is structured in two parallel tracks:

  • A general track, suggesting a way to work analytically with structured data where distant reading of a large dataset is used as context for a close reading of distinctive datapoints.
  • An example track, in which we use simple functions in the programming language R to analyze Twitter data.

Combining these two tracks, we show how scalable reading can be used to analyze a wide variety of structured data. Our suggested scalable reading workflow includes two distant reading approaches that will help researchers to explore and analyze overall features in large data sets (chronologically and in relation to binary structures), plus a way of using distant reading to select individual data points for close reading in a systematic and reproducible manner.

Learning outcomes

After completing this lesson, you will be able to:

  • Set up a workflow where exploratory, distant reading is used as a context to guide the selection of individual data points for close reading
  • Employ exploratory analyses to find patterns in structured data
  • Apply and combine basic filtering and arranging functions in R
Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Cite as

Max Odsbjerg Pedersen, Josephine Møller Jensen, Victor Harbo Johnston, Alexander Ulrich Thygesen, Helle Strandgaard Jensen, Tiago Sousa Garcia and Frédéric Clavert (2022). Scalable Reading of Structured Data. Version 1.0.0. Edited by James Baker. ProgHist Ltd. [Training module]. https://doi.org/10.46430/phen0103

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Scalable Reading of Structured Data
Authors:
Max Odsbjerg Pedersen, Josephine Møller Jensen, Victor Harbo Johnston, Alexander Ulrich Thygesen, Helle Strandgaard Jensen
Domain:
Social Sciences and Humanities
Language:
en
Published:
4/15/2024
Content type:
Training module
Licence:
CCBY 4.0
Sources:
Programming Historian
Topics:
Data modeling, Data management
Version:
1.0.0