Relation Extraction at Collective Intelligence 2017

We are happy to announce that our project exploring relation extraction from natural language has 2 extended abstracts accepted at the Collective Intelligence conference this summer! Here are the papers:

  • Crowdsourcing Ambiguity-Aware Ground Truth: we apply the CrowdTruth methodology to collect data over a set of diverse tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We prove that capturing disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, a method which enforces consensus among annotators. By applying our analysis over a set of diverse tasks we show that, even though ambiguity manifests differently depending on the task, our theory of inter-annotator disagreement as a property of ambiguity is generalizable.
  • Disagreement in Crowdsourcing and Active Learning for Better Distant Supervision Quality: we present ongoing work on combining active learning with the CrowdTruth methodology for further improving the quality of DS training data. We report the results of a crowdsourcing experiment ran on 2,500 sentences from the open domain. We show that modeling disagreement can be used to identify interesting types of errors caused by ambiguity in the TAC-KBP knowledge base, and we discuss how an active learning approach can incorporate these observations to utilize the crowd more efficiently.

ControCurator demonstration at ICT Open 2017

Our demo of ControCurator titled “ControCurator: Human-Machine Framework For Identifying Controversy” will be shown at ICT Open 2017. In this demo the ControCurator human-machine framework for identifying controversy in multimodal data is shown. The goal of ControCurator is to enable modern information access systems to discover and understand controversial topics and events by bringing together crowds and machines in a joint active learning workflow for the creation of adequate training data. This active learning workflow allows a user to identify and understand controversy in ongoing issues, regardless of whether there is existing knowledge on the topic.

DIVE+ Presentation at Cross Media Café

On 7th of March the DIVE+ project will be presented at Cross Media Café: Uit het Lab. DIVE+ is result of a true inter-disciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. In this project, we use the CrowdTruth methodology and framework in order to crowdsource events for the news broadcasts from The Netherlands Institute for Sound and Vision (NISV) that are published under open licenses in the OpenImages platform. As part of the digital humanities effort, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools, that aims at supporting the media studies researchers and scholars by providing access to digital data and tools. In order to develop this project we work together with eScience Center, which is also funding the DIVE+ project.