The DARIAH Working Group Lexical Resources brings together experts working on tools and methods for the creation and dissemination of lexical resources. In addition to functioning as a forum for scholarly exchange and organising training activities, the WG leads the development of TEI Lex-0, a technical specification and a set of community-based recommendations for encoding machine-readable dictionaries.
The work of the WG fosters research excellence through the creation and diffusion of new knowledge, the imparting of new skills and the development of new technical standards, while promoting scholarly networks and close collaboration across institutional and national borders.
Lexicographers, linguists, humanities researchers, data modeling experts, especially those interested in standards and interoperability.
Dictionaries lie at the core of humanity’s ability to conceptualise, systematise and convey meaning. Indeed, a dictionary is many things at once: it is a text, a tool, a model of language, and a cultural artifact deeply embedded in the historical moment of its production [1]. The DARIAH Working Group Lexical Resources has been established to help scholars create, transform and study dictionaries as digital objects.
The activities of the DWGLR have been seminal in integrating and sustaining previous work on encoding dictionaries; improving the coverage of lexicographic data in the TEI Guidelines; propelling the scholarly debate around modeling lexicographic data; championing the importance of open standards to a new generation of scholars; and establishing TEI Lex-0 as an internationally recognised data interchange format. In recognition of these activities, the WG was awarded the 2020 Rahtz Prize for TEI Ingenuity by the TEI Consortium.
The need for a stricter customisation of the TEI standard aimed specifically at encoding lexical resources was established during the COST Action European Network of e-Lexicography (ENEL) in the period from 2013-2017 [2]. Upon the completion of the Action, the WG took it upon itself to implement TEI Lex-0 as an open-source project on GitHub [3] and make it available for easy online consultation [4].
The distributed, bottom-up model of a DARIAH Working Group proved itself to be flexible enough to attract wide community participation [5] while providing institutional stability necessary for a standardisation project of this magnitude.
In addition to creating the customised, examples-rich TEI Lex-0 Guidelines, which lexicographers, researchers and students can use to learn more about the best practices for encoding lexical data, the WG has contributed to the development of TEI itself by improving the handling of lexicographic data in the TEI Guidelines via a number of proposals which were subsequently accepted by the TEI Council [6-12].
“Members of the DARIAH Working Group Lexical Resources have made a valuable contribution to the Dictionaries Chapter of the TEI Guidelines. Their efforts and their expertise have been formidable and highly appreciated by the TEI community over the years.”
Martina Scholger, Chair, TEI Technical Council
Members of the WG have presented at a number of conferences and published a number of papers, not only on the rationale and the overall features of TEI Lex-0 [13], but also on a range of specific lexicographic topics such as the encoding of written and spoken forms [14], etymologies [15], multiword expressions [16], usage labels [17], the challenges of encoding complex academic dictionaries [18] etc. This wide range of topics is indicative not only of the robustness of TEI Lex-0 itself, which can be used to describe all the elements of the dictionary macro-, micro- and mesostructure, but also of the contribution which members of the WG are making to the wider scholarly debate around modelling lexicographic data. For a full bibliography, see the TEI Lex-0 Zotero Group [19].
TEI Lex-0 and best practices in lexical data modeling have been introduced to more than 90 young scholars from across Europe at a number of training events including: Lexical Data Masterclasses (Berlin, 2017 and 2018) [20, 21, 22, 23]; courses at the Lisbon Summer School in Linguistics (2018 and 2019) [24, 25] and a DH Training Workshop: Digital Methods for Linguistic Investigation (Berlin, 2019) [26].
“Participating in the Lexical Data Masterclass 2018 was a fascinating experience: I learned about the principles of TEI Lex-0 and how to work efficiently with XML editors. Perhaps most importantly, I got a chance to consult with experts from the DARIAH Working Group Lexical Resources on my own dictionary project. These exchanges proved to be especially valuable now that I’m writing my PhD thesis. Looking back, I realize how much this intense learning experience shaped my path as a young researcher.”
Marija Žarković, PhD Student at the Universitat Autònoma de Barcelona
TEI Lex-0 has been recognised as an interchange format (together with Ontolex-Lemon) of the European Lexicographic Infrastructure (ELEXIS) [27], which currently has 17 partner and 50 observer institutions from 35 countries. The ELEXIS Observers Network is constantly growing, and so will the global outreach of TEI Lex-0.
In addition, TEI Lex-0 has been chosen as the native encoding format in two recently funded projects: Electronic lexical database of Indo-Iranian languages funded by The Technology Agency of the Czech Republic [28]; and MORDigital: The Digitization of the Diccionario da Lingua Portugueza by António de Morais Silva funded by the Portuguese Foundation for Science and Technology [29].
“We chose TEI Lex-0 for our new project MORDigital because we wanted to use a standard, well-documented and interoperable format for encoding Diccionario da Lingua Portugueza. We hold in high regard the consistency of TEI Lex-0 as well as the opportunity to engage directly with colleagues from the DARIAH Working Group Lexical Resources.”
Prof. Rute Costa, Universidade NOVA de Lisboa
In recognition of the international impact that TEI Lex- 0 has had on the modeling of lexicographic data, the DWGLR was awarded the 2020 Rahtz Prize for TEI Ingenuity, which is awarded to an individual or team judged to have made a significant contribution to the TEI Consortium’s mission of “developing and maintaining a set of high-quality guidelines for the encoding of humanities texts” [30].
[1] Tasovac, T. (2010). Reimagining the Dictionary, or Why Lexicography Needs Digital Humanities. Proceedings from Digital Humanities 2010: http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-883.html
[2] https://www.elexicography.eu/
[3] https://github.com/DARIAH-ERIC/lexicalresources
[4] https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html
[5] For a list of DWGLR members who contributed to the development of TEI Lex-0, see https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#index.xml-body.1_div.1_div.2_div.3
[6] https://github.com/TEIC/TEI/issues/1791
[7] https://github.com/TEIC/TEI/issues/1809
[8] https://github.com/TEIC/TEI/issues/1512
[9] https://github.com/TEIC/TEI/issues/1702
[10] https://github.com/TEIC/TEI/issues/1510
[11] https://github.com/TEIC/TEI/issues/1734
[12] https://github.com/TEIC/TEI/issues/1810
[13] Romary, Laurent and Toma Tasovac. 2018. "TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources." Presentation at TEI Conference in Tokyo.
[14] Bański, Piotr, Jack Bowers and Tomaž Erjavec. 2017. "TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms." eLex 2017 Proceedings: 485-494.
[15] Bowers, Jack, Alex Herold and Laurent Romary. 2018. "TEI-Lex0 Etym – towards terse(r) recommendations for the encoding of etymological information". Presentation at TEI Conference in Tokyo.
[16] Tasovac, Toma, Ana Salgado and Rute Costa. 2020. "Encoding Polylexical Units with TEI Lex-0: A Case Study." Slovenščina 2.0: Empirical, Applied and Interdisciplinary Research 8(2), 28-57.
[17] Salgado, Ana, Rute Costa and Toma Tasovac. 2019. "Improving the Consistency of Usage Labelling in Dictionaries with TEI Lex-0." Lexicography 6: 133–156.
[18] Salgado, Ana, Rute Costa, Toma Tasovac and Alberto Simões. 2019. "TEI Lex-0 In Action: Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa." eLex 2019 Proceedings: 417-433.
[19] https://www.zotero.org/groups/2711819/tei_lex-0/library
[20] https://lexmc.sciencesconf.org/
[21] https://digilex.hypotheses.org/386
[22] https://lexmc18.sciencesconf.org/
[23] https://digilex.hypotheses.org/551
[24] https://clunl.fcsh.unl.pt/en/lisbon-summer-school-linguistics-2018-noticia/
[25] https://clunl.fcsh.unl.pt/en/lisbon-summer-school-in-linguistics-2019/
[27] https://elex.is/
[28] https://www.isvavai.cz/cep?s=jednoduche-vyhledavani&ss=detail&n=0&h=TL03000369
[30] https://tei-c.org/2020/11/23/2020-rahtz-prize-for-tei-ingenuity/
Lead Author: Toma Tasovac
Cite as
Tasovac, Toma. (2021). DARIAH Impact Case Study series: Advancing the State of the Art: DARIAH Working Group Lexical Resources. Zenodo. https://doi.org/10.5281/zenodo.7311454