by Veerle Van den Daelen (DARIAH-BE)
As contemporary historians start to investigate the possibilities and opportunities that digital humanities can offer, a major obstacle for them to do so is that the collection-holding institutions face a myriad of challenges to successfully participate and share their data and metadata with digital research infrastructures. Too often, attention is devoted to the development of methodologies and tools for the researcher, and too little time and energy are invested in assuring that the data-providing institutions can live up to the assumed expectations, developed in a micro-test environment (on one corpus or type of sources).
While many initiatives have been taken to set up digital humanities resource discovery services and VREs, the access to, and sharing of, descriptive metadata has been held back. Often this is due to metadata having been created in bespoke systems for internal use only and not published in a machine-usable form to the outside digital world. Projects such as CENDARI and EHRI have been pioneers in merging metadata from repositories preserving Europe’s twentieth century history but were also confronted with this problem, to the extent that the initial concepts behind these projects (to harvest existing archival collection – or other level – descriptions into a portal), have proven challenging to deliver. Practice has taught that in most cases, contact between an infrastructure project and the archives, initiates internal and external discussions, and unstructured one-off changes to procedures in response to the request for metadata. In very many cases considerable manual work is required to prepare the data for integration into the research portals. Important bespoke adjustments to the metadata and/or its structure are required to be carried out by either the content-providing institution or the aggregating project’s team. These methods of preparing and publishing metadata are not sustainable for data providers or aggregators. Experiences in EHRI and CENDARI infrastructure projects make clear that collection-holding institutions are often not ready to share their data in a sustainable format. There is a clear risk that this data will remain available only in institutional or static aggregations, where it will not be suitable for future research applications.
The DARIAH Open Humanities-project started off with a workshop in Brussels on 29 and 30 September 2015 of the five initiative-taking institutions to merge the knowledge on this subject from projects such as CENDARI and EHRI based on their experiences and on their (un-)published reports relating to this subject (surveys about which tools and standards are being used by the institutions, workshops relating to the topic, pre-processing activities needed to integrate data from different institutions). They will draft a text for collection-holding institutions to share this knowledge and raise awareness about the benefits for their institutions to have documented workflows to open up their holdings into standardized digital metadata. This text will be supplemented with an overview of possible scenarios and working points for archivists to take into account.
A second workshop will be organised in Brussels on 9 and 10 December 2015. The draft text, scenarios and possible working points will be discussed with a group of collection-holding institutions. The invited participants, potential future candidates for research infrastructures, are interdisciplinary institutions (collections, memory, public history and research), dealing with European conflicts from twentieth century history. The workshops and their outcomes will address the technical and organisational challenges the collection-holding institutions are facing, best practices and possible solutions will be shared amongst the participants: what are the archives’ strategic plans and policies for responding to requests for metadata and data? How can an archive integrate sustainable publishing of metadata (and data) into its existing archival workflows? How does an archive ensure metadata validates against the standards used? Equally, what does an archive need to know from a research infrastructure looking to integrate metadata and especially access to data? How should user-generated data content and access to confidential data be managed, in combination and separately? In many ways, a self-analysis of policy and procedures of contemporary archival institutions is needed and should explore topics such as the policies and procedures of data management in general, standards used, capabilities and capacities of staff, internal organisational structure and dynamics, quality assurance procedures, privacy protection, digital publishing strategy and the funding of such endeavours and efforts. Options need to be explored on how to overcome challenges in reaching the goal of producing standards-compliant, publishable data and metadata managed and governed by the providers, but available for both institutional use and reuse within advanced research environments. As such, this initiative fully subscribes into DARIAH’s core mission to bring together national, regional, and local endeavours and communities-of-practice to form a cooperative infrastructure where complementarities and new challenges are clearly identified and acted upon. As we wish to offer a platform for archivists to reflect on their methodologies to ensure sustainable publishing and sharing of their metadata and data, this project finds itself in the heart of DARIAH’s mission to ensure that accepted standards and best practice examples are followed.
The main partner in this project is the Centre for Historical Research and Documentation on War and Contemporary Society (CEGESOMA, Brussels, part of DARIAH-BE). Four other initiative-taking partners are Data Archiving and Networked Services (DANS, The Hague, part of DARIAH-NL), the NIOD Institute for War, Holocaust and Genocide Studies (NIOD, Amsterdam, part of DARIAH-NL), Trinity College Dublin (TCD, Dublin, part of DARIAH-IE), and the Institute for the Study of Totalitarian Regimes and Security Services Archive (USTR and ABS, Prague). As such, the initiative-taking partners bring together institutions, which have been founded at different times in the second half of the twentieth century (NIOD in 1945, CEGESOMA in 1967, DANS in 2005, and USTR and ABS in 2007), and therefore combine different histories of practice and have taken different approaches to the organisation of, and technology used for, managing their archival materials. Moreover, in this project EHRI and CENDARI join forces in their mission to ensure the description and availability of Europe’s heritage in an interoperable way and with the highest possible quality and future utility.