Text and Data Analytics

Home ActivitiesWorking Groups

Natural language processing is the key technological area for analysis and knowledge extraction from unstructured data in text form. There is a wide array of tasks and approaches: from statistical tools such as concordancers, to annotation for Sentiment Analysis, to parsing for information extraction. Each of these tasks can be achieved through a variety of technical approaches. Every engineering solution is, however, necessarily a trade-off of capabilities: some work reliably over large scale, some require significant training information, some make strict assumptions about language.

It is vital for Digital Humanities researchers to have a clear understanding of the capabilities, requirements and limitations of NLP tools. The alternative, treating the computer as a ‘black box’ that spits out results, is methodologically unsound.

This Working Group, in collaboration with VCC1 and VCC2, takes as its objective a process of developing and demonstrating methodologies for applying the state of the art in NLP to Humanities research questions. It will do so by seeking to educate humanities scholars about the different technologies, tools and analytical results. The key activities of the WG will be to provide guides to specific best practice, to inventory successful applications of prominent tools, and to demonstrate specific examples of services which can be used in Digital Humanities scholarship.