The CrowdTruth team is preparing to attend the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP), taking place in Zurich, Switzerland, July 5-8. We are happy to announce we will be presenting two papers in the main track:
- Capturing Ambiguity in Crowdsourcing Frame Disambiguation (Anca Dumitrache, Lora Aroyo, Chris Welty):
- A Study of Narrative Creation by Means of Crowds and Niches (Oana Inel, Sabrina Sauer, Lora Aroyo):
FrameNet is a computational linguistics resource composed of semantic frames, high-level concepts that represent the meanings of words. We present an approach to gather frame disambiguation annotations in sentences using a crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. We perform an experiment over a set of 433 sentences annotated with frames from the FrameNet corpus, and show that the aggregated crowd annotations achieve an F1 score greater than 0.67 as compared to expert linguists. We highlight cases where the crowd annotation was correct even though the expert is in disagreement, arguing for the need to have multiple annotators per sentence. Most importantly, we examine cases in which crowd workers could not agree, and demonstrate that these cases exhibit ambiguity, either in the sentence, frame, or the task itself, and argue that collapsing such cases to a single, discrete truth value (i.e. correct or incorrect) is inappropriate, creating arbitrary targets for machine learning.
Online video constitutes the largest, continuously growing portion of the Web content. Web users drive this growth by massively sharing their personal stories on social media platforms as compilations of their daily visual memories, or with animated GIFs and memes based on existing video material. Therefore, it is crucial to gain understanding of the semantics of video stories, i.e., what do they capture and how. The remix of visual content is also a powerful way of understanding the implicit aspects of storytelling, as well as the essential parts of audio-visual (AV) material. In this paper we take a digital hermeneutics approach to understand what are the visual attributes and semantics that drive the creation of narratives. We present insights from a nichesourcing study in which humanities scholars remix keyframes and video fragments into micro-narratives i.e., (sequences of) GIFs. To support the narrative creation for humanities scholars a specific video annotation is needed, e.g., (1) annotations that consider literal and abstract connotations of video material, and (2) annotations that are coarse-grained, i.e., focusing on keyframes and video fragments as opposed to full length videos. The main findings of the study are used to facilitate the creation of narratives in the digital humanities exploratory search tool DIVE+.
We will also appear in the Collective Intelligence co-located event, where we will be discussing our paper False Positive and Cross-relation Signals in Distant Supervision Data (Anca Dumitrache, Lora Aroyo, Chris Welty), previously published at AKBC 2017:
Distant supervision (DS) is a well-established method for relation extraction from text, based on the assumption that when a knowledge-base contains a relation between a term pair, then sentences that contain that pair are likely to express the relation. In this paper, we use the results of a crowdsourcing relation extraction task to identify two problems with DS data quality: the widely varying degree of false positives across different relations, and the observed causal connection between relations that are not considered by the DS method. The crowdsourcing data aggregation is performed using ambiguity-aware CrowdTruth metrics, that are used to capture and interpret inter-annotator disagreement. We also present preliminary results of using the crowd to enhance DS training data for a relation classification model, without requiring the crowd to annotate the entire set.
If you are attending HCOMP 2018, we hope you will stop by our presentations!