In this tutorial, we introduce the CrowdTruth methodology for crowdsourcing ground truth by harnessing and interpreting inter-annotator disagreement. CrowdTruth is a widely used crowdsourcing methodology adopted by industrial partners and public organizations, e.g. Google, IBM, New York Times, The Cleveland Clinic, Crowdynews, The Netherlands Institute for Sound and Vision, Rijksmuseum, and in a multitude of domains, e.g. AI, news, medicine, social media, cultural heritage, social sciences. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus, provide more reliable and realistic real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. The goal of this tutorial is to introduce the Semantic Web audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions (human semantics) inherent to the Web. We believe it is quite timely, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.
- Anca Dumitrache, Lora Aroyo and Chris Welty: Crowdsourcing Ground Truth for Medical Relation Extraction. ACM TiiS Special Issue on Human-Centered Machine Learning (in publication).
- Oana Inel and Lora Aroyo: Harnessing Diversity in Crowds and Machines for Better NER Performance . Research Track at ESWC 2017.
- Lora Aroyo, Chris Welty: Truth is a Lie: 7 Myths about Human Annotation, AI Magazine 2014. (pdf)
- Lora Aroyo, Chris Welty: The Three Sides of CrowdTruth. J. Human Computation. 1(1). 2014.
- Oana Inel, Khalid Khamkham, Tatiana Cristea, Arne Rutjes, Jelle van der Ploeg, Lora Aroyo, Robert-Jan Sips, Anca Dumitrache and Lukasz Romaszko: Crowd Truth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data. ISWC-RBDS 2014.