Online event | October 26, 4:30-6:00 pm EST/10:30pm-12:00am CEST
This Roundtable invites two co-authors of the recently published paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” to speak with three leading digital humanities scholars about the implications of the article for humanities research employing NLP methods. Together, they will discuss how the authors’ attention to process (data gathering, documentation, standards) and ethics in AI can be turned to humanists creating data and models for the study of literature, history, and culture.
This paper was published in March 2021 and has, since then, sparked impassioned conversations on the unintended consequences and potential harms of prominent natural language processing (NLP) projects. While this groundbreaking paper has been influential in computer and data science—prompting reflection on the dangers of relying on poorly conceptualized and curated data—it is only beginning to be discussed by humanities scholars who use NLP methods in their research.
- Angelina McMillan-Major (University of Washington, Computational Linguistics)
- Gimena del Rio Riande (University of Buenos Aires, Romance Philology)
- Lauren Klein (Emory University, English and Quantitative Theory & Methods)
- Margaret Mitchell (CEO & Research Scientist, Ethical AI LLC)
- Ted Underwood (University of Illinois, Information Science)
- Toma Tasovac (DARIAH-EU)
This event is part of the ongoing workshop series The New Languages for NLP: Building Linguistic Diversity in the Digital Humanities, held at the Center for Digital Humanities at Princeton and funded by the National Endowment for Humanities. It is co-sponsored by the Center for Statistics and Machine Learning at Princeton and DARIAH-EU.
The language teams of the NLP workshop series were selected from a large field of over eighty-five applications, and chosen for the potential impact of their projects for current speakers as well as scholars studying historical languages. Since June 2021, the nineteen participants are creating linguistic data and trained language models for the following world languages:
- Classical Arabic (ٱلْعَرَبِيَّةُ ٱلْفُصْحَىٰ)
- Classical Chinese (文言文, funded by the CDH)
- Kanbun (寛文)
- Ottoman Turkish (لسان عثمانى)
- Quechua (Qheswa simi)
- Dostoevsky’s Russian (funded by the Canadian Social Sciences and Humanities Research Council)
- Tigrinya (ትግርኛ)
- Yiddish (ייִדיש)
- Yoruba (Èdè Yorùbá)
A trans-Atlantic partnership
DARIAH-EU has signed a Cooperating Partnership agreement with the Center for Digital Humanities at Princeton University in early 2021. The designation of cooperating partner allows the CDH to build on its already impressive range of international activities and expand its relationship with DARIAH itself. Princeton University is DARIAH’s first Cooperating Partner from outside of Europe.