
ATRIUM Summer School on Automatic Text Recognition
September 1, 2025 - September 5, 2025

We are pleased to announce the ATRIUM Summer School on Automatic Text Recognition, taking place from September 1–5, 2025, at the DARIAH Coordination Office in Berlin. This intensive five-day training event is designed for researchers, digital humanists, and cultural heritage professionals who want to explore and implement state-of-the-art OCR (optical character recognition) and HTR (handwritten text recognition) technologies in their projects.
You have automatically extracted text from digitized material, but are not satisfied with the output? You want to improve the accuracy of your transcribed data? You know how to apply an existing model to your documents but want to go further and learn how to fine-tune a model or create one from scratch? You work in a project team that deals with complex data needing particular attention in the Automatic Text Recognition process in order to address a specific difficulty? You want to tackle advanced recognition issues? Then the ATRIUM Summer School on ATR is made for you.
About the Summer School
The ATRIUM Summer School will provide an in-depth approach to automatic text recognition with a focus on practical applications in concrete research scenarios. Participants will gain insights into the latest developments in OCR and HTR, focusing on open-source tools such as eScriptorium and workflows that facilitate the digitization and analysis of historical and modern texts.
During one week, the trainer team will alternate methodological input and supervision of hands-on sessions for the participants to improve their automatic text recognition pipelines. Input will cover not only the manipulation of pre-processing, segmentation, layout analysis, and post-processing, but also data management, empowering participants to achieve concrete goals in terms of the management, processing and reusability of their data within the duration of the summer school and beyond.
What to expect?
- Expert-led lectures and practical sessions on ATR tools and techniques
- Hands-on training with open-source platforms such as eScriptorium
- Case studies showcasing real-world applications in research and cultural heritage
- Opportunities for participants to work on their own projects and receive expert feedback
- Networking with peers and leading scholars in the field
Who should apply?
- Researchers in the humanities and social sciences
- Digital humanists and computational linguists
- Archivists, librarians, and cultural heritage professionals
- Developers and data scientists working with textual data
Basic familiarity with digital humanities and at least some experience with OCR/HTR is required, as we will quickly jump into practice.
We will consider applications from teams of up to two members working on the same project requiring an automatic text recognition pipeline, provided they have well-defined objectives for the summer school week, and a dataset to work on.
Technical Requirements
The ATRIUM Summer School will support you in processing and transcribing your documents. Therefore, it is imperative that you bring your own dataset, such as scanned pages of the documents you want to transcribe. Please ensure that the digitization is of good quality. The dataset should not contain images that are too noisy (avoid blurry images, stains, tears, etc.) as it would severely hinder the recognition process. Finally, although high resolution is not necessary, a minimum resolution of 300 dpi is recommended to ensure recognition by the software.
Participants are expected to bring their own laptop.