Welcome to the CrowdTruth blog!

The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.

The CrowdTruth framework supports the composition of CrowdTruth gathering workflows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workflow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workflow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.

Watson Innovation Course wins ICT project of the year in education

watsoneducation

Yesterday at the Computable Awards the Vrije Universiteit, University of Amsterdam and IBM won the prize for “ICT project of the year in education” with the Watson Innovation Course. Furthermore, the project was highest rated across all nominees of all prize categories. The course is ongoing at the moment for the second time, with an improved setup and new state of the art tools for the students.

The course is run by Lora Aroyo, Anca Dumitrache, Benjamin Timmermans and Oana Inel from the VU, and Robert-Jan Sips and Zoltan Szlavik from IBM. In the course the students were challenged by Amsterdam Marketing to solve the issue of the increasing overcrowdedness of tourists in the city center of Amsterdam. The city is culturally rich with many places to visit, yet most visitors cluster around a limited set of popular locations. The students came up with ideas to motivate visitors to spread in the city and provide them with relevant information for their visit.

computable-award

Interested in working with IBM and ABN AMRO on an exciting innovation project?

Are you interested in working with IBM and ABN AMRO on an exciting innovation project?

Although the mortgage application process and the regulations surrounding that are clearly mapped, institutionalised and supported by automated systems, the process of orientation in the housing market is not. When looking for an appropriate place to live / open a business, clients of the bank are confronted with questions and considerations like the location, image of and facilities in the neighbourhood and the safety of this neighbourhood, energy labels, average price levels, future development plans, etc., surrounding one of the largest decisions of their life: the purchase of a house. At current, the bank is unclear about the steps customers take in the orientation process and with which extra services / answers / information their bank could support them.

Within this 3 month project, IBM, VU and UvA will join forces and will use the IBM Open Innovation approach to come up with a new data-driven service concept for one of the leading banks in the Netherlands. The team will consist of IBM Consultants (design / business strategy), an IBM programmer, 2 IBM researchers and researchers from VU Amsterdam and UvA.

To strengthen our team we are looking for 3 students for a 3-month-internship at IBM, potentially followed by a MSc thesis project deepening/continuing upon their work, in the following disciplines:

(1) Business/Service Innovation: Students with an entrepreneurial / service innovation background, looking to gain experience in the development of a real-life business case. The student should ideally have experience with focus groups and qualitative interviews, to help gain initial insights into the house orientation process.

(2) Crowdsourcing: Using crowdsourcing and the social web, to get a clear(er) picture of the demands, questions, uncertainties surrounding the purchase of a house. This project will complement the work done by student (1) with quantitative results and a larger scope.

(3) Open Data / Information Retrieval: Finding (open) datasets and retrieving datasources which would be able to provide insights in the questions identified by the work done by student (1) and (2).

If you are interested in an internship with IBM / ABNAMRO within the context of this project, please contact Lora Aroyo via lora.aroyo@vu.nl with a short motivation why you would like to work on it, CV and your availability in the coming 3-4 months.

Digging into Military Memoirs

On 8th and 9th of September the workshop “Digging into Military Memoirs” took place at the Royal Netherlands Institute of Southeast Asian and Caribbean Studies, in Leiden. The workshop, organized by Stef Scagliola, was a great opportunity to get a close contact with researchers, historians in various fields such as interviews, oral history, cross-media analysis among others. During the workshop the participants experimented with digital technologies on the basis of a corpus of 700 documents published about the veterans in Indonesia.

The aim of the workshop was to explain to a group of around 20 historians the possibilities of Digital Humanities tools and methods. The workshop was divided in four sessions (Data Visualization, Open Linked Data, Text Mining and Crowdsourcing) and each part was composed of a short presentation and hands-on assignments to be performed individually or in groups. The main expectation for each of the sessions was to inform the researchers about the most appropriate tools/applications to use at each stage of their research in order to generate faster and more efficient insights for their work.

The crowdsourcing session was developed and presented together with Liliana Melgar. We divided the session in two parts. The first part was to be followed as an example, Liliana provided brief explanations about the current state-of-the-art in crowdsourcing approaches in Digital Humanities and other fields. In the second part, the historians were able to experiment with different examples of crowdsourcing task and further develop a project idea (based on their own interests) where crowdsourcing would make a good candidate.

Sign up for the Watson Innovation Course!

Have you ever wondered how we could provide tourists in Amsterdam with the best experience? Now is your chance to develop ideas, business cases and real prototypes of Watson to answer all questions tourists have.

The Watson Innovation course is a collaboration between the Vrije Universiteit, University of Amsterdam and IBM Netherlands. It offers a unique opportunity to learn about IBM Watson, cognitive computing and the meaning of such artificial intelligence systems in a real world and big data context. Students from Computer Science and Economics faculties will join their complimentary efforts and creativity in cross-disciplinary teams to explore the business and innovation potential of such technologies. Visit the course page to find out all the details.

Crowdsourcing brainstem tumors at Lowlands 2016

lowlands

Brainstem tumors are a rare form of childhood cancer for which there is currently no cure. The Semmy Foundation aims to increase the survival of children with this type of cancer by supporting scientific research. The Center for Advanced Studies at IBM Netherlands is supporting this research by developing a cognitive system that allows doctors and researchers to quicker analyse MRI-scans and better detect anomalies in the brainstem.

In order to gather training data, a crowdsourcing event was held at the festival Lowlands, which is a 3-day music festival that took place from 19-21 August 2016 and welcomed 55k visitors. At the science fair, IBM had a booth that hosted both this research and showcase of the Weather stations of the Tahmo project with TU Delft.

screenshot

In the crowdsourcing task, the participants were asked to draw the shape of the brainstem and tumor in an MRI scan. Gathering data on whether a particular layer of a scan contains the brainstem and determining its size should allow a classifier to recognize the tumors. Furthermore, the annotator quality can be measured with the CrowdTruth methodology by analysing the precision of the edges that were drawn in relation to their alcohol and drug use that we collected. The hypothesis is that people under influence can still make valuable contributions, but that these are of lower quality than sober people. This may make the reliability of online crowd workers more clear, because it is unknown under what conditions they make their annotations.

heatmap

The initial results in the heatmap of drawn pixels give an indication of the overall location of the brainstem, but further analysis will follow on the individual scans in order to measure the worker quality and generating 3d models.