Skip to main content

Automatic Text Recognition (ATR) - Step 2: Where and How to Get Images

In this tutorial, you learn about the typical methods for obtaining high-quality scanned images, setting the stage for successful text recognition processes. It outlines obtaining digital copies from archives, digitisation needs, and copyright considerations. It also covers using online resources like Google Books and Internet Archive while navigating public domain copyright issues. Lastly, it highlights the Heritage Data Reuse Charter for collaboration between cultural heritage institutions and researchers.

Learning Outcomes

After completing this resource, learners will be able to:

  • Recognise the sources for acquiring suitable images for text recognition.
  • Assess the quality of images in terms of their suitability for ATR.
  • Implement strategies for collecting and organising images for processing.
  • Apply basic techniques for scanning and digitizing textual materials.

You can read the blogpost (available in English, French, and German), and watch our video (with subtitles in English, French, and German) embedded in the post.

Interested in learning more?

Check out "Automatic Text Recognition - Step 2: Where and How to Get Images"

Go to this resource

Cite as

Anna Busch, David Lassner and Aneta Plzáková (2024). Automatic Text Recognition (ATR) - Step 2: Where and How to Get Images. Version 1.0.0. Edited by Anne Baillot and Mareike König. Deutsches Historisches Institut Paris. [Training module]. https://harmoniseatr.hypotheses.org/332

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Automatic Text Recognition (ATR) - Step 2: Where and How to Get Images
Authors:
Anna Busch, David Lassner, Aneta Plzáková
Domain:
Social Sciences and Humanities
Language:
en
Published:
6/12/2024
Content type:
Training module
Licence:
CCBY 4.0
Sources:
DARIAH
Topics:
Editing tools, Machine Learning, Automatic Text Recognition
Version:
1.0.0