Skip to main content

Big data

Big data refers to extremely large and/or complex datasets, and the methods used to manage and analyse them

Resources

  • Corpus Analysis with spaCy

    EN
    This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
  • Photogrammetry 3D Digitisation

    EN
    This resource is an introduction to the photogrammetry technique to capture visual data about cultural heritage assets and produce associated 3D models.
    Authors
    • Karina Rodriguez Echavarria
    • Myrsini Samaroudi
    • Nicola Schiavottiello
    Read more
  • FAIR Multidimensional Data

    EN
    This resource offers a starting point to learn more about the different types of multidimensional media, as well as managing media in a way which promotes the FAIR principles. The resource also introduces the concept of a Virtual Research Environment to support retrieval and curation of multidimensional data for storytelling via interoperable frameworks.
  • Digitisation with 360 Degrees Photography

    EN
    This resource is an introduction to 360 degrees panorama photography. It explores different types of panoramic representations and examples of 360 degree panoramas in the cultural heritage domain. Practical advice and step by step guidance on how to capture data and process them is also included in order to produce and publish 360 degrees panorama images.
    Authors
    • Karina Rodriguez Echavarria
    • Nicola Schiavottiello
    Read more
  • Data Ethics in Cultural Heritage

    EN
    This resource aims to introduce the main aspects of data ethics in the cultural heritage domain. It also examines how data management can be supported to become more ethical, while also addressing topical discourse about data ethics in the sector. The resource also aims to support in critically reflecting on some case studies with evident digital data ethics considerations.
  • Digitisation Methods for Material Culture

    EN
    This resource is an introduction to Digitisation Methods for Material Culture. The resource explores basic topics with regards to the study of material culture, while also looking at types of media as means to communicate and share information about it, as well as digitisation methods to capture material culture data.
    Authors
    • Karina Rodriguez Echavarria
    • Myrsini Samaroudi
    • Nicola Schiavottiello
    Read more
  • Creating Stories with 3D Data on the Web

    EN
    This resource provides guidance on how to use digital storytelling, deploying 3D data, annotations and combining media to enable users to access and explore information about digital heritage assets over the web.
    Authors
    • Karina Rodriguez Echavarria
    • Nicola Schiavottiello
    Read more
  • Text Analysis - Linguistics Meets Data Science

    EN
    What are the differences between a data scientist and a corpus linguist? This course provides an overview of the different perspectives on language and different types of tools that can be used for text analytics. It also introduces topic modelling and sentiment analysis as approaches to textual data.
  • The Learning Curve in Sharing Data with the EHRI Project

    EN
    A partnership between Kazerne Dossin and EHRI was established to enable sharing of metadata with a broader audience. This partnership resulted in changes to the practices of cataloguing archival materials within Kazerne Dossin. Using the example of the Lewkowicz family collection, this article focuses on the revolution Kazerne Dossin went through while standardising descriptions, and on the tools EHRI provided to optimise the workflow for collection holding institutes.
    Authors
    • Dorien Styven
    • Marius Caragea
    • Veerle Vanden Daelen
    Read more
  • Data Journalism and AI: New frontiers in investigation and storytelling

    EN
    Data is now an indispensable part of investigative work and storytelling for journalists and newsrooms. Computational methods and artificial intelligence are making their way to newsrooms more than ever before, and promise to open up new opportunities for journalists, as well as new challenges. This talk provides an overview of how data and Artificial Intelligence can be used in the journalism workflow, investigative reporting and storytelling.
  • What Can I Do With This Messy Spreadsheet? Converting from Excel Sheets to Fully Compliant EAD-XML files

    EN
    Many Galleries, Libraries, Archives, and Museums (GLAMs) face difficulties sharing their collections metadata in standardised and sustainable ways, meaning that staff rely on more familiar general purpose office programs such as spreadsheets. However, while these tools offer a simple approach to data registration and digitisation they don’t allow for more advanced uses. This blogpost from EHRI explains a procedure for producing EAD (Encoded Archival Description) files from an Excel spreadsheet using OpenRefine.