This blog post from EHRI introduces 'quod' (querying OCRed documents), a prototype Python-based command line tool for OCRing and querying digitised historical documents, which can be used to organise large collections and improve information about provenance. To demonstrate its use in context, this blog takes the reader through a case study of the International Tracing Service, showing workflows and the steps taken from start to finish.
This blog examines TEITOK, which is a corpus framework used as an alternative to Omeka. TEITOK is centered around texts and is similar to the Omeka interface – both allow you to search through the documents, and display the transcription. The main difference is that Omeka treats the transcription as an object description, whereas TEITOK not only shows that a word appears in a document, but also where it appears and how it is used.
This video tutorial provides a step-by-step guide through the DARIAH-DE Publikator, a tool that enables its users to upload data(-sets) into the DARIAH-DE Repository and index them with metadata. The tool is part of the larger DARIAH-DE Data Federation Architecture, aiming to support the FAIRification of research data with regards to the research data life cycle.
The SSHOC-DARIAH Train-the-Trainer Research Data Management Bootcamp ('Research Data Management Bootcamp' for short) took place over two half-day workshops that gave access to experts in the field and allowed for real-time activities between the sessions. It was co-organised by the SSHOC project and the DARIAH 'Research Data Management' Working Group.
This tutorial explains the fundamentals of the DARIAH-DE Publikator, a tool which allows you to prepare, manage, and finally import your collections into the DARIAH-DE Repository using your favourite internet browser. The Repository provides the ability to store research data and enrich them with metadata. Through the use of persistent identifiers, a permanent machine-readable reference is ensured and findable via a generic search. The tutorial contains guides for users as well as technical documentation.
Thesauri, taxonomies and other forms of controlled vocabularies represent a conceptual backbone of the research, playing an ever-increasing role in various aspects of the data management process. These resources are indispensable to determine common understanding allowing to systematically categorize and enrich research data in a consistent manner, as well as foster the data interoperability and integration among projects and web applications.