CrowdTruth Tutorial: How to build ambiguity-aware ground truth

In this tutorial, we introduce the CrowdTruth methodology for crowdsourcing ground truth by harnessing and interpreting inter-annotator disagreement. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus, provide more reliable and realistic real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.

The goal of this tutorial is to introduce the methodology and provide guided exercises on how to apply it in specific cases. As dealing with disagreement and diversity in crowdsourcing are becoming increasingly popular, this tutorial provides a timely solution. All the materials of the tutorial will be publically available. We will provide slides, handouts and python notebooks (both Jupiter and Colab). Understanding and skills in Python can be helpful for getting most out of this tutorial. However, we envision work in small groups to allow people with various prior experiences to group together.

CrowdTruth methodology and framework is a widely used crowdsourcing methodology adopted by industrial partners and public organizations, e.g. Google, IBM, New York Times, The Cleveland Clinic, Crowdynews, The Netherlands Institute for Sound and Vision, Rijksmuseum, and in a multitude of domains, e.g. AI, news, medicine, social media, cultural heritage, social sciences. You can find the data from our experiments, extensive list of papers and more you can find on the CrowdTruth website.

The first edition of the CrowdTruth tutorial will be held during the ISWC 2018 conference, on Monday, October 8th , 2019, Hotel Asilomar in Monterey, California, USA. Follow updates on Twitter and FB for #CrowdTruth #ISWC2018 @ISWC2018.

Tutorial Schedule

Content Time Materials
Opening Session 09:00 – 09:15 Introduction of presenters and material
Session 1: Introduction to CrowdTruth 09:15 – 10:30 Presentation [slides]
Coffee Break
Session 2: CrowdTruth Task Design &
Building Annotation Vectors
11:00 – 12:30 (25 min) Presentation [slides]
(45 min) Hands-on Exercises [slides] [hand-out]
(20 min) Reflections [slides]
Session 3: CrowdTruth Data Processing &
CrowdTruth Metrics
14:00 – 15:30 (25 min) Presentation [slides]
(20 min) Getting started with CrowdTruth [slides] [guide]
(45 min) Hands-on Exercises [slides] [hand-out]
Coffee Break
Session 4: Results & Closing Reflections 16:00 – 17:00 Presentation [slides]

Tutorial Organizers

Lora Aroyo

Anca Dumitrache

Oana Inel

Chris Welty


