Anca Dumitrache

Relation Extraction at Collective Intelligence 2017

We are happy to announce that our project exploring relation extraction from natural language has 2 extended abstracts accepted at the Collective Intelligence conference this summer! Here are the papers:

  • Crowdsourcing Ambiguity-Aware Ground Truth: we apply the CrowdTruth methodology to collect data over a set of diverse tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We prove that capturing disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, a method which enforces consensus among annotators. By applying our analysis over a set of diverse tasks we show that, even though ambiguity manifests differently depending on the task, our theory of inter-annotator disagreement as a property of ambiguity is generalizable.
  • Disagreement in Crowdsourcing and Active Learning for Better Distant Supervision Quality: we present ongoing work on combining active learning with the CrowdTruth methodology for further improving the quality of DS training data. We report the results of a crowdsourcing experiment ran on 2,500 sentences from the open domain. We show that modeling disagreement can be used to identify interesting types of errors caused by ambiguity in the TAC-KBP knowledge base, and we discuss how an active learning approach can incorporate these observations to utilize the crowd more efficiently.

CrowdTruth @ Watson Experience MeetUp

CrowdTruth had an appearance at the Watson Experience MeetUp last week. Together with Zoltán Szlávik, my colleague from IBM, we talked about the pervasive myths that still influence how we collect annotation from humans. While time and money constraints definitely influence data quality, the common core of all of these issues is rather the very definition of quality, as well as what the value of ambiguous data is. The slides of the talk were based on this paper.

Thank you to Jibes for organizing, Rabobank Utrecht for hosting us, and especially to Loes Brouwers and Tessa van der Eems for setting all of this up!

2 Papers Accepted at ISWC 2015 Workshops

We are happy to announce that two CrowdTruth papers have been accepted at the workshops of the 14th International Semantic Web Conference (ISWC 2015). Both of them present some exciting results from our work with medical relation extraction.

The first one, Achieving Expert-Level Annotation Quality with CrowdTruth: the Case of Medical Relation Extraction, will appear in the Biomedical Data Mining, Modeling, and Semantic Integration (BDM2I) workshop. Download it here, or read the abstract below:

The lack of annotated datasets for training and benchmarking is one of the main challenges of Clinical Natural Language Processing. In addition, current methods for collecting annotation attempt to minimize disagreement between annotators, and therefore fail to model the ambiguity inherent in language. We propose the CrowdTruth method for collecting medical ground truth through crowdsourcing, based on the observation that disagreement between annotators can be used to capture ambiguity in text. In this work, we report on using this method to build a ground truth for medical relation extraction, and how it performed in training a classification model. Our results show that, with appropriate processing, the crowd performs just as well as medical experts in terms of the quality and efficacy of annotations. Furthermore, we show that the general practice of employing a small number of annotators for collecting ground truth is faulty, and that more annotators per sentence are needed to get the highest quality annotations.

The second one, CrowdTruth Measures for Language Ambiguity: the Case of Medical Relation Extraction, will appear in the Linked Data for Information Extraction (LD4IE) workshop. Download it here, or read the abstract below:

A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sentences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this paper we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are inadequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classifier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to account for the rather obvious fact that some sentences express the target relations more clearly than others.

CrowdTruth Measures for Language Ambiguity

A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sentences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this paper we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are inadequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classifier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to account for the rather obvious fact that some sentences express the target relations more clearly than others.

Read More...

Achieving Expert-Level Annotation Quality with CrowdTruth

The lack of annotated datasets for training and benchmarking is one of the main challenges of Clinical Natural Language Processing. In addition, current methods for collecting annotation attempt to minimize disagreement between annotators, and therefore fail to model the ambiguity inherent in language. We propose the CrowdTruth method for collecting medical ground truth through crowdsourcing, based on the observation that disagreement between annotators can be used to capture ambiguity in text. In this work, we report on using this method to build a ground truth for medical relation extraction, and how it performed in training a classification model. Our results show that, with appropriate processing, the crowd performs just as well as medical experts in terms of the quality and efficacy of annotations. Furthermore, we show that the general practice of employing a small number of annotators for collecting ground truth is faulty, and that more annotators per sentence are needed to get the highest quality annotations.

Read More...

Crowdsourcing Disagreement for Collecting Semantic Annotation

This paper proposes an approach to gathering semantic annotation, which rejects the notion that human interpretation can have a single ground truth, and is instead based on the observation that disagreement between annotators can signal ambiguity in the input text, as well as how the annotation task has been designed. The purpose of this research is to investigate whether disagreement-aware crowdsourcing is a scalable approach to gather semantic annotation across various tasks and domains. We propose a methodology for answering this question that involves, for each task and domain: defining the crowdsourcing setup, experimental data collection, and evaluating both the setup and the results. We present initial results for the task of medical relation extraction, and propose an evaluation plan for crowdsourcing semantic annotation for several tasks and domains.

Read More...