Welcome to the CrowdTruth blog!

The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.

The CrowdTruth framework supports the composition of CrowdTruth gathering workflows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workflow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workflow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.

Best poster award for CrowdTruth at ICT OPEN 2016

poster2

On the 22nd of March we presented our latest work on CrowdTruth at the ICT.OPEN 2016 conference. We are happy to announce that our poster received the best poster award in the Human and the Machine track. Furthermore, Anca Dumitrache gave a presentation and pitched our poster which resulted in the 2nd prize for best poster of the conference. It is a good signal that from the almost 200 posters the importance of the CrowdTruth initiative was recognized.

IMG_20160322_1915071 CePa15IXEAAnSYv

CrowdTruth @ Watson Experience MeetUp

CrowdTruth had an appearance at the Watson Experience MeetUp last week. Together with Zoltán Szlávik, my colleague from IBM, we talked about the pervasive myths that still influence how we collect annotation from humans. While time and money constraints definitely influence data quality, the common core of all of these issues is rather the very definition of quality, as well as what the value of ambiguous data is. The slides of the talk were based on this paper.

Thank you to Jibes for organizing, Rabobank Utrecht for hosting us, and especially to Loes Brouwers and Tessa van der Eems for setting all of this up!

CrowdTruth 2.0 released

Today we released version 2.0 of the CrowdTruth framework. In the update the data model of the platform is changed, so that data and crowdsourcing results can be managed and reused more easily. This allows for several new features that have been integrated, such as project management and permissions. Users can create projects and share their crowdsourcing jobs within these projects. The media search page has been updated to accommodate any type of data, where you can search through the media in the platform. Another improvement to the platform is the automatic setup of new installations. This makes it easier for new users to get started straight away. You can find a list of the changes in the change log. Try out the platform and get started!

Scientific poster design

posterdesign

Recently the CrowdTruth team got a paper accepted at ICT Open 2016. As part of this upcoming conference, I visited a masterclass on scientific poster design at NWO. The class was given by two professional designers.

The most important thing in your poster is having a clear message. This can be achieved by creating a visual focus. This means that you should not give all images the same size, but guide the reader visually with placement and size of text and images. You have to be able to read the main message from far away and can include the fine details smaller for when the reader is up close. In order to achieve this, there should only be one main focus point to start from.

After having a starting point, there should be a clear hierarchy throughout the poster. The amount of levels of information should be reduces as much as possible, for instance four or five maximum. Most of the content from your paper is not suitable for the poster, only use the most suitable parts, and optionally include more text with details using a small font size at the bottom. Organize the message systematically by using a grid so that all elements are aligned along this grid.

poster2

The typography is another very important but also often forgotten aspect of poster design. Choose one proper typography that is well readable and has enough options to variate in size and style. Though, try to minimize the differences in font size, matching the hierarchy structure of the content. Write easy to read sentences but make sure the lines are not too short or long to improve the readability.

The colors of the poster are also an important aspect. Do not use a picture or image with different colors behind a text, it usually makes it too difficult to read. Applying a drop shadow to solve this is not a good solution. Try to never use shadows. Instead, focus on having a high contrast between the text and background color.

For using images and graphics, apply the same rules as for text color. Choose the most important image and decide if it communicates with your audience. It is better to choose one powerful image than a lot of random images. The chronological order of the poster can be changed by positioning the main thing in an unusual position, but then this focus point and the continuing hierarchy must be very clear.

Finally, it is best with scientific posters to just put all logos in a clear line at the bottom in a color bar. They could also be placed vertically, although this is less common and tends to take up more space. When in doubt, just put something big in the poster to get the attention of the audience. Make the poster stand out from the 200 other ones in the same room.

Watson Innovation Course Closing Event

zoltan

On Friday 22 January Gerard Smit (CTO for IBM Belgium, Netherlands, Luxembourg) and Prof. Hubertus Irth (Vice Dean and Research Director of the Vrije Universiteit Faculty of Earth and Life Sciences and Faculty of Sciences) officially launched the collaboration between the IBM Collaborative Innovation Center (CIC) and the VU Faculty of Sciences. In the event students of the Watson Innovation course pitched their projects to a mixed crowd of students, scientists, engineers and business clients.

In the Watson Innovation course, students used Watson to answer questions about Amsterdam, for which Amsterdam Marketing provided the data and use case. The app LocalBuddy was selected as winner, and the students received a prize for their achievements by Amsterdam Marketing.