Skip to main content

DYLEN: Diachronic Dynamics of Lexical Networks

Introduction

DYLEN is the acronym of the Diachronic Dynamics of Lexical Networks (Baumann et al. 2019). It is an interactive visualisation tool (Yim et al. 2022) that the Diachronic Dynamics of Lexical Networks project team created to provide insights into the dynamic lexical changes of Austrian German during the 21st century. It helps lexicographers and linguists to analyse the development of Austrian German lexemes over the course of time. It is an open source tool that can be used free of charge.

Learning Outcomes

After completing this resource, learners will be able to:

  • understand the purpose of DYLEN

  • be able to read a visualisation that was created in DYLEN

  • know how to undertake an ego network analysis with DYLEN

  • generate a general network analysis

Diachronic Dynamics of Lexical Networks

DYLEN enables lexical network research on large-scale authentic language data that are taken from two Austrian German corpora, the Austria Media Corpus (amc), (Ransmayr et al. 2017) and Corpus of Austrian Parliamentary Records (ParlAT).

DYLEN provides three options:

  • Ego network,

  • General network (party),

  • General network (speaker),

and 2 additional components:

  • Node metrics comparison,

  • Time series analysis.

Screenshot of the DYLEN user interface.

The following comic provides a visual summary of this article and illustrates the key features of the DYLEN tool.

Networks

Diachronic networks derive from the texts in amc and ParlAT with the help of word embeddings. In NLP, word embeddings are representations of words.

The user interface is very intuitive but every search starts with deciding on either an ego network or a general network (party or speaker). In each network type, you can analyse various parameters of a single entity or compare two entities. The first step on your diachronic network journey is to select the network that you would like to generate.

Ego Network

Connected words are semantic neighbours that share some aspects of the target word. Some can even substitute the target word in a particular context. The ego network visualises the 50 most closely related semantic neighbours of a target word. Note that it does not show the target word itself because it would render the visualisation impossible to read. The semantic neighbours are classified as parts of speech (POS), e.g. noun, proper nouns and verbs.

A graph, two line graphs to show the semantic neighbours, node metrics and time series analysis for the word 'Geld' in the amc texts in 1996.

Ego network of the word “Geld” (money), taken from the amc texts in 1996.

Instructions:

Input field for ego network

On the input field on the left side bar, you can

  1. select a corpus (i.e., amc or PARLAT),

  2. select a subcorpus (e.g., a specific newspaper),

  3. type a target word (e.g., ‘Geld’),

  4. and finally click Visualise.

Understanding the Visualisation

Once you clicked on visualise, DYLEN will generate your network. Let us stick with our “Geld” (money) example.

The ego network for 'Geld': differently sized nodes connected by lines, a timeline on the top and parts of speech in different colours

The ego network for “Geld” (money)

Above, you see the semantic neighbours represented by nodes that can be dragged further apart to get a better overview. Their size indicates their frequency. The bigger the node, the more commonly it is used in the corpus. You can click on each node to highlight the connections. The colours represent different parts of speech and you can change them to your preference.

Time Series Analysis

The Time Series Analysis allows to compare two words over time; the comparison can be relative to the first year, last year or previous year.

Metrics and Node Metric Comparison

A bar with five sliders

Parallel coordinates options: normalised frequency, degree centrality, betweenness centrality, pagerank, clustering coefficient.

In addition, you can select the metrics for the parallel coordinates with the sliders. These metrics are presented in the parallel coordinates plot. Every graph line represents one word and each vertical axis stands for the value in the respective metric. You can visualise all words or selected words in the node metrics comparison. When you click the lines, you can inspect the values for each metric.

Four lines cut by five axes for the words: verurteilen, Strafe, Haft and Gefängnis

Lines and fives axes for the words “verurteilen”, “Strafe”, “Haft”, “Gefängnis” when analysing an ego network for “Geldstrafe” in 1996.

General Networks

General networks reflect the speeches of a particular politician or a political party. Those networks are larger than ego networks and require more filters that make the visualisation more legible. Under general networks you can explore frequent lexemes used by particular political parties (general network (party)) or individual politicians (general network (speaker)) in the Austrian Parliament.

Instructions:

On the input field on the left side bar, you can

  1. select a party,

  2. select a speaker (only for general network (speaker)),

  3. (optional, but recommended) use the Node filter to

    • select a metric (e.g. degree centrality)

    • adjust the percentage of nodes to be displayed,

  4. and finally click Visualise.

Four visualisations for general network (party)

General network (party) comparison for SPÖ and ÖVP in 2000. The word “brauchen” (need) and its connections are highlighted in the first network visualisation.

Node Metrics Comparison

The general network analysis allows for node metric comparison too. You can choose between the same metrics as in the ego network. When you compare two parties or speakers, each component gets a different colour. Also, you can ask DYLEN to return a table for the node metrics with indicating colours (see below).

Table with metric columns showing words in alphabetical order and the metric values.

In the table, one can see a table of nodes and the values for the respective metrics.

Time Series Analysis

In the general network analysis, the development of speakers or parties can be traced, like the ego network traces individual words. You can visualise your results as a graph on a timeline, or as a table with selected metrics and values. All your options for analysis are explained in more detail on the DYLEN website in the technical details in the Time Series Analysis tab.

Knowledge Test

Practical Understanding and Exercises

Please use our DYLEN tool integration or open the tool in a new tab to complete the following tasks. If you are on a mobile device, use the hamburger menu in the top right corner to open the Dylen tool’s navigation bar including links to “Ego Network” etc.

Conclusion

DYLEN represents a significant advancement in the analysis of dynamic lexical changes in Austrian German throughout the 21st century and offers a sophisticated, interactive visualisation platform that supports in-depth research into the evolving dynamics of language.

By utilising data from the Austria Media Corpus (amc) and the Corpus of Austrian Parliamentary Records (ParlAT), DYLEN provides valuable insights into how Austrian German lexemes have developed over time. Its open-source nature and user-friendly interface make it an accessible tool for lexicographers and linguists, facilitating both ego network and general network analyses.

The tool’s features, including node metrics comparison and time series analysis, allow researchers to explore linguistic data comprehensively and track lexical trends with precision. As a result, DYLEN not only enhances our understanding of language evolution but also underscores the importance of innovative tools in advancing linguistic research. Its contribution to the field highlights the growing potential of interactive visualisations in studying diachronic language dynamics.

Side Note

This post was originally published on 17 February 2023 by the ACDH-CH.

About the DYLEN Project

DYLEN Tool

DYLEN Comic

HowTo use the amc and CQL

References

  • Baumann, Andreas, Julia Neidhardt, and Tanja Wissik. 2019. DYLEN: Diachronic Dynamics of Lexical Networks. In Proceedings of the Poster Session of the 2nd Conference on Language, Data and Knowledge (LDK-PS 2019), ed. Thierry Declerck and John P. McCrae, 2402:24–28. CEUR Workshop Proceedings. Leipzig, Germany: CEUR.

  • Ransmayr, Jutta, Karlheinz Mörth and Matej Ďurčo (2017): AMC (Austrian Media Corpus) – Korpusbasierte Forschungen zum österreichischen Deutsch. In Digitale Methoden der Korpusforschung in Österreich (= Veröffentlichungen zur Linguistik und Kommunikationsforschung Nr. 30), Hrsg. C. Resch und W. U. Dressler, 27-38. Wien: Verlag der Österreichischen Akademie der Wissenschaften.

  • Wissik, Tanja, and Hannes Pirker. 2018. ParlAT beta Corpus of Austrian Parliamentary Records. In LREC2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora In Proceedings of the Eleventh International Conference on Language Resources and Evaluation LREC2018, ed. Darja Fišer, Maria Eskevich, and Franciska de Jong. Miyazaki: European Language Resources Association.

  • Yim, Seung-bin, Katharina Wünsche, Asil Cetin, Julia Neidhardt, Andreas Baumann, and Tanja Wissik. 2022. Visualizing Parliamentary Speeches as Networks: the DYLEN Tool. In Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, ed. Darja Fišer, Maria Eskevich, Jakob Lenardič, and Franciska de Jong, 56–60. Marseille, France: European Language Resources Association.

Cite as

Elisabeth Königshofer and Katharina Wünsche (2024). DYLEN: Diachronic Dynamics of Lexical Networks. Version 1.0.0. Edited by Elena Zotou. DARIAH-Campus. [Training module]. https://campus.dariah.eu/id/uTbN0NGu8c1pnt1HH5Vz7

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
DYLEN: Diachronic Dynamics of Lexical Networks
Authors:
Elisabeth Königshofer, Katharina Wünsche
Domain:
Social Sciences and Humanities
Language:
en
Published to DARIAH-Campus:
9/12/2024
Content type:
Training module
Licence:
CCBY 4.0
Sources:
ACDH-CH
Topics:
Data management
Version:
1.0.0