Corpus Analysis

Corpus analysis is the process of interrogating large language data collections (corpora) for purposes such as linguistic investigation or machine learning

Understanding and Creating Word Embeddings
EN
Word embeddings allow you to analyze the usage of different terms in a corpus of texts by capturing information about their contextual usage. Through a primarily theoretical lens, this lesson will teach you how to prepare a corpus and train a word embedding model. You will explore how word vectors work, how to interpret them, and how to answer humanities research questions using them.
Authors
Avery Blankenship
Sarah Connell
Quinn Dombrowski
Read more →
Digitization Workflow: Talk with Sorin Marti, a Data Steward's Perspective
EN
In this podcast, produced by virtualculture.ch, sociologist Jane Haller, Digitales Schaudepot president, is conversing with Sorin Marti, a data steward in the Research Infrastructure Support Entity (RISE) at the University of Basel to discuss aspects of data management for public consumption.
Authors
Vera Chiquet
Jane Haller
Sorin Marti
Read more →
Corpus Query Language im Austrian Media Corpus
DE
Diese Ressource stellt den Austria Media Corpus (amc) und seine Nutzungsmöglichkeiten vor. Sie erklärt die Durchführung von Abfragen in der corpus query engine namens Sketch Engine. Im Besonderen geht es um die Einführung in Sketch Engine's "Corpus Query Language" (CQL). Das Ziel des Tutorials ist es, den Benutzern des Austrian Media Corpus (amc) einen einfachen Einstieg in die Abfrage des amc mit der Sketch Engine und CQL zu ermöglichen. Daher ist das Tutorial bewusst in deutscher Sprache gehalten. Alle Beispiele im Tutorial sind direkt dem amc entnommen.
Authors
Hannes Pirker
Read more →

Corpus Analysis

Resources

Understanding and Creating Word Embeddings

Digitization Workflow: Talk with Sorin Marti, a Data Steward's Perspective

Corpus Query Language im Austrian Media Corpus