Exploring Topics and Contents in Text Collections
Contact Person: Sina Bock
Description
LDA (Latent Dirichlet Allocation) Topic Modeling is a method for analyzing the distribution of semantic word clusters, so-called “topics” in a text collection. It can be used for exploring the content of a corpus as well as generating content-related features for computational text classification. Topic modeling thereby relies on the analyzed texts themselves entirely; it does not use additional sources of information like dictionaries or external training data, making it largely independent from language and orthographic convention. It is based solely on a statistical analysis of symbol co-occurrence (on word level) that is translated into likely semantic relationships. Hence, topic modeling is, in terms of its requirements on text type and text quality, one of the more flexible methods in computational text analysis.
The Topicsexplorer is a beginner-oriented Software allowing interested researchers to experiment with topic modeling on their own computers, with their own text corpora. The entire necessary workflow from plain text to visualized results – some of them even visualized in interactive graphs – is done in a graphical user interface (GUI). The software thus allows users without programming skills to load a collection of plain-text or even xml files and to analyze it by means of the LDA algorithm. It is implemented as a stand-alone software that runs on common Windows, MacOS and Linux operating systems without installation. As a result, users can scan texts for re-occurring, semantically meaningful groups of words; they can explore how much each these semantic groups contribute to each text, and which texts probably share the same common themes and topics. For further analysis with other software results can be exported in the universally readable csv format.
The TopicsExplorer is primarily designed as an educational to for both classroom and self learner use. It allows users without prior knowledge or skill to quickly engage and experiment with topic modeling. Users can thereby without greater effort learn about the possibilities and limitations of the method.
Website: https://de.dariah.eu/en/web/guest/topicsexplorer
Representatives from DARIAH-EU were delighted to attend a two-day conference in University
The DARIAH Theme Call 2024-2026 on the topic of Mistakes was open
We are excited to share the dates of the DARIAH Annual Event
It’s December which means it’s time for our 2024 advent calendar! 2024
We’d love to hear what you think of the DARIAH website! Your
To keep up to date with all the exciting plans and projects ahead for @DARIAHeu throughout 2024, subscribe to our monthly newsletter
— DARIAH-EU (@DARIAHeu) January 4, 2024
⬇️⬇️⬇️⬇️⬇️https://t.co/vPVNxCsBrJ pic.twitter.com/CUjIUwACzJ