Skip to main content

Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision

Handwritten documents are appealing artifacts and a mainstay of research for many historians. Sources such as diaries, letters, logbooks and reports connect historians to writers not only through the writer’s words, but also through their individual writing style. However, research involving large amounts of these documents represents a significant challenge: transcription of documents into digital form makes them more searchable, but hand transcription is very time-consuming. While historians have been able to digitize physical typewritten documents using optical character recognition (OCR), handwriting, with its individual styles, has until recently resisted recognition by computers.

While training a customized handwriting recognition model is possible and sometimes required, it remains very difficult. Fortunately, ready-trained handwriting recognition services are available commercially. Microsoft, Google Cloud Platform and Amazon Web Services are companies that offer handwriting recognition services over the web. These services equip the historian who would like a faster means to transcribe handwritten documents, as long as these documents are legible and in a writing system that is recognizable by the service.

For this lesson, we will use Microsoft’s Azure Cognitive Services to transcribe handwriting. Microsoft’s Azure Cognitive Services can be harnessed to transcribe typed text, handwriting, or a combination of both. It can transcribe diaries, letters, forms, logbooks and research notes. Transcription with Azure Cognitive Services is well documented, but does require some programming, hence this lesson.

Reviewed by:

  • Maria Dermentzi
  • Megan S. Kane

Learning outcomes

After completing this lesson, you will be able to:

  • Write a Python program to transcribe images of handwritten documents using Microsoft’s Azure Cognitive Services, a commercially available service that has a cost-free option for low volumes of use
  • Add steps to process multiple images at once and store the transcribed text in a file
  • Better organise your code using a Python function
Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Cite as

Jeff Blackadar (2023). Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision. Version 1.0.0. Edited by Giulia Taurino. ProgHist Ltd. [Training module]. https://doi.org/10.46430/phen0114

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision
Authors:
Jeff Blackadar
Domain:
Social Sciences and Humanities
Language:
en
Published to DARIAH-Campus:
12/10/2024
Originally published:
12/6/2023
Content type:
Training module
Licence:
CCBY 4.0
Sources:
Programming Historian
Topics:
Python, Automatic Text Recognition
Version:
1.0.0