Benjamin Timmermans

Sign up now for the first edition of the Watson Innovation Course!

Have you ever wondered how we could provide tourists in Amsterdam with the best experience? Now is your chance to develop ideas, business cases and real prototypes of Watson to answer all questions tourists have.

The Watson Innovation course is a collaboration between the Vrije Universiteit, University of Amsterdam and IBM. It offers a unique opportunity to learn about IBM Watson, cognitive computing and the meaning of such artificial intelligence systems in a real world and big data context. Students from Computer Science and Economics faculties will join their complimentary efforts and creativity in cross-disciplinary teams to explore the business and innovation potential of such technologies. Visit the course page to find out all the details.

Netherlands eScience Symposium


On Thursday 9th of October was the Netherlands eScience symposium in the Amsterdam Arena. This yearly event attracts scientists and researchers from many different disciplines. In the digital humanities track, Oana Inel of the CrowdTruth team gave a talk on the Dive+ project. This is a digital cultural heritage project in which innovative access to online collections is provided, with the purpose of supporting digital humanities scholars and online exploration for the general public. This project is supported by the Netherlands eScience center, and used CrowdTruth for the crowdsourcing of events in historical data. The talk titled “Towards New Cultural Commons with DIVE+” can be seen below.

Awards at IBM Extreme Blue Expo


Our research group received several awards at the IBM Extreme Blue Expo 2015. In this event, an array of speakers from IBM, Academia and the startup community presented their latest findings. The Shared University Research Award was granted to the Web & Media group of the VU University Amsterdam. This award is a global initiative by IBM to stimulate science and the collaboration between IBM and scientists. Furthermore, Lora Aroyo received a faculty award for her work on our project CrowdTruth, and Anca Dumitrache reveived a PhD Fellowship award for her work on medical relation extraction.

As part of the shared university research award we received access to the Watson Engagement Advisor Research platform, which will be used for collaborative research on methods for the training and evaluation of IBM Watson. In the upcoming months we will jointly host a Watson innovation course for students and Watson masterclasses for professionals. More info will follow on this at a later stage.

Amsterdam Data Science: Coffee and Data


On Friday 11th of September I pitched the medical relation extraction work of my CrowdTruth colleague Anca Dumitrache at the third Amsterdam Data Science: Coffee and Data event. The purpose of this was to get in touch with researchers that have medical datasets that are for instance incomplete or contain errors. With our research, we want to investigate how we can improve the quality of this data. Several other interesting presentations on data science in the medical domain were given at this event, which was hosted on the top floor of the VU University Amsterdam. Together with Merel van Empel, we also presented our latest work on gamification of crowdsourcing for advancing biology using BioCrowd. Feel fee to try out the game and provide us with feedback.

A sound corpus with perceptual representations


On Monday 31st of August I presented the preliminary results of my work on sound representations during the weekly Artificial Intelligence meeting at the VU University Amsterdam. In this collaboration with Emiel van Miltenburg, a sound corpus is built with annotations on how people perceive these sounds. Sounds can often be interpreted in multiple ways, but tags in sound corpora do not directly relate to the acoustic features of sounds. Because of this limited representation of what can be heard in a sound, the ranking of search results is not optimal. In this research, we use crowdsourcing to build an annotated corpus of sounds from with meaningful representations that are perceptually grounded. The presented slides can be seen below or on slideshare.

Welcome to the CrowdTruth blog!

The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.

The CrowdTruth framework supports the composition of CrowdTruth gathering workflows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. Figure Eight and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workflow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workflow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.