Crowdsourcing Inclusivity with CrowdTruth Tutorial @ WebConf 2019
Dealing with diversity of opinions, perspectives and ambiguity in annotated data
The second edition of the CrowdTruth tutorial will be held during the Web Conference 2019, on Monday, May 13th 2019, at the Hyatt Regency in San Francisco, California, USA. Follow updates on Twitter and FB for #CrowdTruth #WebConf2019 @thewebconf
In this tutorial, we introduce the CrowdTruth methodology for crowdsourcing ground truth by harnessing and interpreting inter-annotator disagreement. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus, provide more reliable and realistic real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.
The goal of this tutorial is to introduce the methodology and provide guided exercises on how to apply it in specific cases. As dealing with disagreement and diversity in crowdsourcing are becoming increasingly popular, this tutorial provides a timely solution. All the materials of the tutorial will be publically available. We will provide slides, handouts and python notebooks (both Jupiter and Colab). Understanding and skills in Python can be helpful for getting most out of this tutorial. However, we envision work in small groups to allow people with various prior experiences to group together.
CrowdTruth methodology and framework is a widely used crowdsourcing methodology adopted by industrial partners and public organizations, e.g. Google, IBM, New York Times, The Cleveland Clinic, Crowdynews, The Netherlands Institute for Sound and Vision, Rijksmuseum, and in a multitude of domains, e.g. AI, news, medicine, social media, cultural heritage, social sciences. You can find the data from our experiments, extensive list of papers and more you can find on the CrowdTruth website.
The first edition of the CrowdTruth tutorial was held during the ISWC 2018 conference.
Tutorial Schedule
Content | Time | |
---|---|---|
Introduction [slides] | 14:00 – 14:35 | |
CrowdTruth Metrics [slides] | 14:35 – 15:00 | |
Hands-on: Video annotation [Colab] [Github] | 15:00 – 15:30 | |
Coffee Break | ||
Hands-on: Textual entailment [Colab] [Github] | 16:00 – 16:30 | |
CrowdTruth Task Design [slides] | 16:30 – 17:30 | |
Wrap-up & Discussion | 17:00 – 17:30 |
Tutorial Organizers
Resources
CrowdTruth Papers
- [CrowdTruth Methodology] Lora Aroyo, Chris Welty: Truth is a Lie: 7 Myths about Human Annotation, AI Magazine 2014. (pdf)
- [CrowdTruth Methodology] Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, Lora Aroyo, Chris Welty: Empirical Methodology for Crowdsourcing Ground Truth, Semantic Web Journal (in publication), 2018.
- [CrowdTruth Framework] Oana Inel, Khalid Khamkham, Tatiana Cristea, Arne Rutjes, Jelle van der Ploeg, Lora Aroyo, Robert-Jan Sips, Anca Dumitrache and Lukasz Romaszko: Crowd Truth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data. ISWC-RBDS 2014.
- [CrowdTruth Metrics] Anca Dumitrache, Oana Inel, Lora Aroyo, Benjamin Timmermans and Chris Welty: CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement, 2018.
- [CrowdTruth Metrics] Lora Aroyo, Chris Welty: The Three Sides of CrowdTruth. J. Human Computation. 1(1). 2014.
- [Use Case] Anca Dumitrache, Lora Aroyo and Chris Welty: Crowdsourcing Ground Truth for Medical Relation Extraction. ACM TiiS Special Issue on Human-Centered Machine Learning 8 (2), 12, 2018.
- [Use Case] Oana Inel and Lora Aroyo: Harnessing Diversity in Crowds and Machines for Better NER Performance . Research Track at ESWC 2017.
- [Use Case] Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, Lora Aroyo: Studying Topical Relevance with Evidence-based Crowdsourcing. To Appear in the Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), 2018.