Topic modelling is a machine-learning technique that finds patterns in language use within a corpus of documents, and clusters those documents accordingly. The commonalities in ther patterns form “topics”, providing a way to automatically categorise documents by their structural content. The specific type of topic modelling covered in this resource is called Latent Dirichlet Allocation - ‘LDA’.
This EHRI notebook walks readers through the process of topic modelling transcripts obtained through the United States Holocaust Memorial Museum (USHMM) using Python and accompanies the article published in the European Holocaust Research Infrastructure (EHRI) Document Blog entitled “Exploratory Topic Modelling in Python”.
After viewing this training resource, users will be able to:
- Understand the basic concepts of topic modelling
- Walk through the process of topic modelling in Python.
Check out Exploratory Topic Modelling in PythonGo to this resource