Big data refers to extremely large and/or complex datasets, and the methods used to manage and analyse them
Many Galleries, Libraries, Archives, and Museums (GLAMs) face difficulties sharing their collections metadata in standardised and sustainable ways, meaning that staff rely on more familiar general purpose office programs such as spreadsheets. However, while these tools offer a simple approach to data registration and digitisation they don’t allow for more advanced uses. This blogpost from EHRI explains a procedure for producing EAD (Encoded Archival Description) files from an Excel spreadsheet using OpenRefine.
The Fortunoff Visual Search is a tool for both data visualisation and collection discovery from the Fortunoff Video Archive for Holocaust Tesimonies. This blogpost demonstrates the Visual Search tool in the Fortunoff Video Archive, including the search and filtering interface, as well as interpreting the resulting visualisations
This blog discusses the applicability of services such as automatic metadata generation and semantic annotation for automatic extraction of person names and locations from large datasets. This is demonstrated using Oral History Transcripts provided by the United States Holocaust Memorial Museum (USHMM).
In the late 1930s, just before war broke in Europe, a series of chaotic deporations took place expelling thousands of Jews from what is now Slovakia. As part of his research, Michel Frankl investigates the backgrounds of the deported people, and the trajectory of the journey they were taken on. This practical blog describes the tools and processes of analysis, and shows how a spatially enabled database can be made useful for answering similar questions in the humanities, and Holocaust Studies in particular.
This blog post from EHRI introduces 'quod' (querying OCRed documents), a prototype Python-based command line tool for OCRing and querying digitised historical documents, which can be used to organise large collections and improve information about provenance. To demonstrate its use in context, this blog takes the reader through a case study of the International Tracing Service, showing workflows and the steps taken from start to finish.
This blog examines TEITOK, which is a corpus framework used as an alternative to Omeka. TEITOK is centered around texts and is similar to the Omeka interface – both allow you to search through the documents, and display the transcription. The main difference is that Omeka treats the transcription as an object description, whereas TEITOK not only shows that a word appears in a document, but also where it appears and how it is used.
This keynote lecture delivered at the DARIAH Annual Event 2021 by Sarah Kenderdine explores how computation has become ‘experiential, spatial and materialized; embedded and embodied’.
In this lecture, Mark Cote takes us on a journey through a host of research projects that contextualise the way that he and other researchers try to address cultural data.