Linked Open Data 4 Living Organisms – A DARIAH-EU funded workshop under the OPEN HUMANITIES THEME 2015
by Eveline Wandl-Vogt, Andrea Steiner
The workshop took place in the Seminar room 1 in Wohllebengasse 12-14, 1040 Vienna, Austria on 3 December 2015 from 10:00 a.m. until 6:00 p.m.
It was organized by the DARIAH partners, the Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Science (Eveline Wandl-Vogt) and Dublin City University of Ireland (Alexander O´Connor) in conjunction with Natural History Museum Vienna (Heimo Rainer). It was supported by Open Knowledge Austria, Wikimedia Austria and the European Grid Infrastructure (EGI) with the DARIAH-conjunct project EGI ENGAGE – DARIAH Competence Centre. The workshop was integrated into the COST action IS 1305 European Network for e-Lexicography.
The intention behind the workshop was to link current research activities on biodiversity and linguistic diversity. To do so, we connected resources and people from different research fields.
We invited to:
- Give an insight into Open Science/Open Humanities
- Give an example of already existing infrastructure (Common Names Service) to support research on biodiversity and linguistic diversity
- Connect researchers working in several disciplines under one umbrella theme of the Humanities and Sciences
- Give examples of data reuse from different disciplines at Europana and the Open Data Platform opendata.at.
- Discuss possible frameworks for data reuse in the field of Linked Open Data.
- Discuss how to publish data via Common Names Service at EUROPEANA.
- Test open science on an example test set on Common Names for Living Organisms.
- Frame a Virtual Working group supporting curiosity driven research on biodiversity and linguistic diversity in an open science framework.
We awarded a student grant of 300 Euros per person for two students.
There was a procedure of application (CV and letter of motivation; decision by organizers). Contribution to the workshop was necessary.
We got 5 applications; Karlien Franco (PhD student at University Leuven) and Andrei Scutelnicu (University of Alexandru Ioan Cuza Ia?i) received the grant.
Both of them provide a short workshop report and contribute to the infrastructure with research.
More information is at our website: http://www.oeaw.ac.at/acdh/de/biodiversity
Welcome and Introduction
Eveline Wandl-Vogt welcomed the participants and gave further details concerning the workshop. She discussed the plans and also the aims of the workshop and of the project.
The main topics of the workshop were:
- What is Open Science?
- How to work with our lexical data?
- How to connect data to the infrastructure (Common Names Services)?
- How to publish data via Common Names Service @ LOD?
- Build up collaborations between humanists (linguists, lexicographers… etc.), software developers and botanists
- Start up a network on research on biodiversity and linguistic diversity – create a virtual working group for sustainable exchange
After that, Eveline Wandl-Vogt shortly introduced DARIAH (basic, goals, high level principles, etc.), explained how the collaboration between DARIAH and the ACDH works (concerning humanities, infrastructures and lexicography) and how it is linked to the European network: e-Lexicography.
She introduced the new, ongoing project EGI ENGAGE with the development of the DARIAH competence centre. Subjects like exchanging knowledge, data enrichment and dictionaries, reusing data, sustainability, heterogeneity and standards were mentioned. Eveline tried to figure out interests and needs of both, linguists and botanists, and to include them.
The second part of the introduction was given by Heimo Rainer (NHM), who discussed the common-names-part of the European project OpenUP!, which is a multimedia project in collaboration with EUROPEANA. The NHM enriched data (1.1 Mio records) and published it at EUROPEANA. In this framework scientific names and common names are combined, and in this way biological and linguistic work is linked.
But there still remain questions like:
- How should scientific names be translated?
- How can we model the data in a more appealing way, e.g. low level language or dialect code, geographical stamp, time stamp?
- Where are the possibilities for a collaboration between modern taxonomy and linguistic?
- How can we use and transport our data to open resources?
- How could the data communication work?
- What would the advantages be?
- What are the risks of publishing openly?
He also gave a demonstration of the EUROPEANA portal (http://www.EUROPEANA.eu/portal/) on the example of Bellis perennis (Eng: daisy).
1. Open science by Stefan Knasberger
To begin with, we got an introduction of what open science in general is and what the point of view of Digital Humanities is.
The history, definitions and keywords of open science were explained and he also gave some information about the meaning of open science in society and in different research fields. Some examples (activities, institutions and organizations) in Austria were given.
The following six terms are important in this case:
- Openness: everybody can use the knowledge (knowledge sharing and transfer)
- Open access: publications
- Open data: sharing data for reuse, projects,
- Open source: results publishing, knowledge sharing, create software for sharing
- Open methodology: sharing of protocols, plans, methods, process
- Open peer review: critics, discussions, feedback
- Open educational resources: teaching materials
At the same time, Stefan Knasberger also named some obstacles, such as licenses, benefits for private companies and institutes by using the open science and how to make use of the data.
2. Open culture @ EUROPEANA by Gerda Koch
The main objective of the presentation was to show what can be done with open data. In this case Gerda Koch gave a short introduction to EUROPEANA and LoCloud.
EUROPEANA is an open cultural data portal which started in 2008. It is a multi-sided platform where researchers and institutions can make their data, in particular metadata, available to access and share. In this way it is possible to search for a diverse range of data and reuse it.
As mentioned earlier it is linked with the project OpenUP! of the NHM.
LoCloud is a practice network for data input and output. It includes micro services, which communicate together and in this way they try to solve technical or semantic problems. It is used by 2500 small and medium-sized institutions, mainly a European community. As far as e-Lexicography is concerned, there are 29 open source vocabularies which are connected to web pages of resources.
At the end of the talk some points/questions were discussed:
- What are the advantages and disadvantages of open science?
- What about licenses?
- How does the system work and where does the data go?
- Which differences between the end-user services, pro services and laboratory services exist?
Discussion 1: Linked Open Data 4 Living Organisms
Eveline and Heimo got into detail on the Common Names Service and how people at the workshop could get connected.
Heimo Rainer gave more information about the Common Names Service, like how we can get the data and where it can be published. Also main points for the development of this service were discussed, i.e. how scientific names can be linked with common names and in what case the taxonomy is relevant for this; or which possibilities we have to identify languages.
The biggest discussion point was how to standardize dialect names and how they should be described.
World café: group work: In this part all participants were divided in three groups of 5-7 people. There were three flipchart papers with different topics:
- Data: already available data, data which we have, data which we could provide (type, period, area, standard, variety, etc.)
- Needs / Requirements: what do we need? what is/would/could be important to have? what we don’t have?
- Research questions: topics in which we are interested
Every group got one paper and had to discuss the particular topic. Then, they had to write down their ideas (brain storming). After 10 minutes the papers were swapped between the groups, so each group could contribute to each of the three topics. After 40 minutes the group work ended.
The group work opened up discussions on the data available, on real needs, fears and visions of the participants. It bridged the gap from the example use case of the Database of Bavarian Dialects in Austria (DBÖ) at the Common Names Service, which Eveline was introducing, to the data and research questions of the participants.
The group work was quite fruitful. The mixed up teams were quite productive.
The group work allowed the people to dig a little bit deeper into there own data and experiences and starting to connect with new people in the same group.
Discussion 2 and hands on: Interlinking possibilities
The results of the group work were extensively discussed and concrete follow ups and action points were established.
The following issues have been identified as curtail:
- potential of lexical atlases
- several data formats (txt, doc, sql, mysql, …)
- make use of other methods (NLP, …)
2) Requirements / needs:
- documentation / info: how can one connect –> workflow description
- heterogeneity versus standardization
- quality measures
- user feedback
- heterogeneity and human diversity
- quantitative methods (dialectometry, etc.)
- associative methods for lexical data
- NLP (natural language processing)
During the discussion, follow up project ideas have been established and small groups have been identified. Financing was detected as relevant issue on how to proceed successfully. Eveline invited to pilot with some data and then go for further funding on base of first results and established workflows.
Conclusions and follow ups
The workshop and especially the results were summarized. Possible partnerships and cooperation’s were discussed.
1) The Common Names Service will be updated towards querying Common Names not just Taxons.
2) The organizers will take care for the development of a platform for biodiversity and linguistic research, where data and services are about to be published. A pilot is established at the AAS and will be run at the AAS. The platform will be labeled as follow up of the workshop.
3) Data is to be published on the open data portal Austria (plan: service available end of January 2016).
4) Several data collected in the framework of the workshop might be a DARIAH competence centre use case.
5) Several data of participants and people who showed via mail interest but could not join the workshop will be discussed for integration into the service and the procedure described. This data will be labeled with the DARIAH-EU logo as well.
6) Data reuse and collaboration with EUROPEANA will be tested.
7) Data reuse is discussed with certain stakeholders like AGROVOC and WIKIDATA and interconnections are tested.
8) There will be a follow up workshop in the year 2016; plan: Aug-Sept in conjunction with COST ENeL, Open Science AT and Wikimedia. It is planned then to have both, a connection with further stakeholders such as AGROVOC and WIKIDATA (both are interested and are already connected, yet could not be on board for this workshop), as well as a again a concrete hands-on session like this very productive one.
9) The development of a DARIAH Working Group on the issue is discussed and would be mostly welcomed.
The Working Group could be a think thank for upcoming collaborations and funding proposals; it could be a starting point for interdisciplinary collaboration towards interdisciplinary and applied Humanities.
The workshop is also documented at the website of the Austrian Center of Digital Humanities: http://www.oeaw.ac.at/acdh/en/biodiversity