In this blog post we introduce ‘quod’ (querying OCRed documents), a prototype Python-based command line tool for OCRing and querying digitised historical documents, which can be used to organise large collections. To demonstrate its use in context, this blog takes the reader through a case study of the International Tracing Service, showing workflows and the steps taken from start to finish.
The archives of the International Tracing Service (ITS) in Bad Arolsen, Germany focus on the topics of wartime incarceration, forced labour, and liberated survivors. Within their holdings is a collection of approximately 4.2 million images of documents that have been rearranged without taking into account their provenance. This makes it difficult — or at least highly laborious — to determine, for example, to which subcollection an image originally belongs; or to organise the collections in certain ways.
ITS has rearranged many of its collections following the needs of the tracing tasks. As a consequence it is relatively easy to search for names of people in the collection. Yet, at the same time, due to the loss of provenance and context information it is hard to approach the collections with research questions. For example, a researcher or archivist may want to define employers of forced labourers and even arrange the collections according to these.
After viewing this training resource, users will be able to:
- recognise how important metadata and provenance are in the management of archive materials
- understand how the Python prototype command line “quod” can be used to organise large collections within archives
Check out: quod: A Tool for Querying and Organising Digitised Historical DocumentsGo to this resource