The Lexical Data Masterclass took place at the Berlin-Brandenburg Academy of Sciences in Berlin, Germany between the 4th and 8th of December 2017. It was co-organized by the Centre Marc Bloch, DARIAH-EU, the Berlin-Brandenburg Academy of Sciences (BBAW), Inria (Paris, France) and the Belgrade Center for Digital Humanities (BCDH, Serbia) with the support of the German Ministry of Education and Research (BMBF), CLARIN, DARIAH-DE and the H2020 European project “Humanities at Scale”.
The aim of the masterclass was to bring trainees and experts together in order to share experiences, methods and techniques for the creation, management and use of digital lexical data. The 20 participants were mentored by 7 tutors who introduced specific methods during instructional sessions that have been used during the masterclass:
- Alexander Geyken from BBAW (Berlin, Germany) and CLARIN-D, who presented the DWDS workflow;
- Laurent Romary from Inria (Paris, France) and DARIAH-EU, who presented an overview of lexical models and introduction to the TEI dictionary chapter as well as querying and presenting TEI dictionary data with XSLT
- Benoît Sagot and Axel Herold from Inria (Paris, France), who discussed about representing etymological processes;
- Toma Tasovac from the Belgrade Center for Digital Humanities (Serbia), who presented XPath for searching dictionaries as well as customizing oXygen for lexicographic work
- Marie Puren from Inria (France), who presented data management practices and recommendations;
- Piotr Banski from the Institut für Deutsche Sprache (Germany), who presented the CLARIN infrastructure for lexical data.
Furthermore, two keynote speakers were invited to contribute to the masterclass: Frieda Steurs from KU Leven and James Pustejovsky from Brandeis University.
This masterclass, i.e. a series of training and working sessions where most of the knowledge transfer is issued through the concrete work on the participants’ projects. We want to reflect here on what everyone has obviously considered as a very successful meeting by providing an overview on the instructional sessions and the actual projects that been brought by the participants as reflected in the final symposium that took place on 8th December.
A specific outcome of this masterclass, conceived as a set of training and working sessions where most of the knowledge transfer is issued through the concrete work on the participants’ projects, was contribution to the assessment of the standardization landscape, in particular the TEI guidelines. Besides, the importance of providing more awareness to data management issue was highlighted during the week, as well as the need to increase the amount of openly available resources. Many more topics have been raised such as:
- Which reuse conditions bear upon the various dictionaries, how to ensure a wide dissemination of the results, comprising the encoded source?
- Which license should be attached to such resources, how to ensure a fluid dissemination when the recommended creative commons CC-BY license is not applicable?
- How to go towards a network of available lexica in the context of stable hosting capacities offered by European infrastructures such as DARIAH and CLARIN?
- How to build up infrastructures that also allow the correction, improvement or enrichment of existing resources?
- How to deploy a reference subset of the TEI guidelines (TEI Lex) that would serve as a target deployment format readily usable by a variety of tools (presentation, query, hosting/LTA)?
At the end of the week, during the final symposium that took place on 8th December, it was clear for all participants that the format of a masterclass was particularly appropriate for lexical projects. Comparing practices and improving one’s own technical skills contributes strongly to the quality of the event.
The masterclass contributed to both CLARIN and DARIAH’s strategic commitment to training and education as an essential component of infrastructure building. The discussions of various data formats, standards and general data management practices in the lexical domain helped attendees expand their horizons and adopt new skills.
In addition to the impact on individual researchers who attended the masterclass, all the training materials were made available in the DARIAH WG “Lexical Resources” GitHub repository: https://github.com/DARIAH-ERIC/lexicalresources/tree/master/Events/LexMC2017.
Considering that the DARIAH Working Group “Lexical Resources” plans to hold further lexical data masterclasses, a fruitful cooperation between CLARIN and DARIAH in this field may continue in the future as well.
A more detailed account of all the participants and their projects can be found at https://digilex.hypotheses.org/386.
Blog post by Toma Tasovac and Laurent Romary, republished from CLARIN-EU.