Welcome to the CrowdTruth blog!

The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.

The CrowdTruth framework supports the composition of CrowdTruth gathering workflows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workflow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workflow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.

Lisbon Machine Learning Summer School 2017 – Trip Report

In the second half of July (20th of July – 27th of July) I attended the Lisbon Machine Learning Summer School (LxMLS2017). As every year, the summer school is held in Lisbon, Portugal, at Instituto Superior Técnico (IST). The summer school is organized jointly by IST, the Instituto de Telecomunicações, the Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC-ID), Unbabel, and Priberam Labs.

Around 170 students (mostly PhD students but also master students) attended the summer school. It’s important to mention that around 40% of the applicants are accepted, so make sure you have a strong motivation letter! For eight days we learned about machine learning with focus on natural language processing. The day was divided into 3 parts: lectures in the morning, labs in the afternoon and practical talks in the evening (yes, quite a busy schedule).

Morning Lectures

In general, the morning lectures and the labs mapped really well, first learn the notions and then put them into practice. During the labs we worked with Python and IPython Notebooks. Most of the labs had the base code already implemented and we just had to fill in some functions. However, for some of the lectures/labs this wasn’t that easy. I’m not going to discuss in detail the morning lectures but I’ll mention the speakers and their topics (also, the slides are available of the website of the summer school):

  • Mario Figueiredo: an introduction to probability theory which proved to be fundamental for understanding the following lectures.
  • Stefan Riezler: an introduction to linear learners using an analogy with the perceptual system of a frog, i.e., given that the goal of a frog is to capture any object of the size of an insect or worm providing it moves like one, can we build a model of this perceptual system and learn to capture the right objects?
  • Noah Smith: gave an introduction of sequence models such as Markov models and Hidden Markov models and presented the Viterbi algorithm which is used to find the most likely sequence of hidden states.
  • Xavier Carreras: talked about structured predictors (i.e., given training data, learn a predictor that performs well on unseen inputs) using as running example a named entity recognition task. He also discussed about Conditional Random Fields (CRF), approach that gives good results in such tasks.
  • Yoav Goldberg: talked about syntax and parsing by providing many examples of using them in sentiment analysis, machine translation and many other examples. Compared to the rest of the lectures, this one had much less math and was easy to follow!
  • Bhiksha Raj: gave an introduction to neural networks, more exactly convolutional neural networks (CNN) and recurrent neural networks (RNN). He started with the early models of human cognition, associationism (i.e., humans learn through association) and connectionism (i.e., the information is in the connexions and the human brain is a connectionist machine).
  • Chris Dyer: discussed about modeling sequential data with recurrent networks (but not only). He showed many examples related to language models, long short-term memories (LSTMs), conditional language models, among others. However, even if it’s easy to think of tasks that
 could be solved by conditional language models, most of the times the data does not exist, a problem that seems to appear in many fields and many examples.

Practical Talks

In the last part of the day we had practical talks or special talks of concrete applications that are based on the techniques learnt during the morning lectures. During the first day we were invited to attend a panel discussion named “Thinking machines: risks and opportunities” at the conference “Innovation, Society and Technology” where 6 speakers (Fernando Pereira – VP and Engineering Fellow at Google, Luís Sarmento – CTO at Tonic App’s, André Martins – Unbabel Senior researcher, Mário Figueiredo – Instituto de Telecomunicações at IST, José Santos Victor – president of the Institute for Systems and Robotics at IST and Arlindo Oliveira – president of Instituto Superior Técnico) in the AI field discussed about the benefits and risks of artificial intelligence and automatic learning. Here are a couple of thoughts:

  • Fernando Pereira: In order to enable people to make better use of technology, we need to make machines smarter at interacting with us and helping us.
  • André Martins pointed out an interesting problem: people spend time on solving very specific things but these are never generalized. -> but what if this is not possible?
  • Fernando Pereira: we build smart tools but only a limited amount of people are able to control them, so we need to build the systems in a smarter way and make the systems responsible to humans.

Another evening hosted the Demo Day, an informal gathering that brings together a number of highly technical companies and research institutions, all with the aim of solving machine learning problems through technology. There were a lot of enthuziastic people to talk to, many demos and products. I even discovered a new crowdsourcing platform, DefinedCrowd that soon might start competing with CrowdFlower and Amazon Mechanical Turk.

Here are some other interesting talks that we followed:

  • Fernando Pereira – “Learning and representation in language understanding”: talked about learning language representation using machine learning. However, machine understanding of language is not a solved problem. Learning from labeled data or learning with distant supervision may not yield the desired results, so it’s time to go implicit. He then introduced the work done by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need. In this paper, the authors claim that you do not need complex CNNs or RNNs models, but it’s enough to use attention mechanisms in order to obtain quality machine translation data.
  • Graham Neubig – “Simple and Efficient Learning with Dynamic Neural Networks”: dynamic neural networks such as DyNet can be used as alternatives to TensorFlow or Theano. According to Graham, here as some advantages of using such nets: the API is closer to standard Python/C++ and it’s easier to implement nets with varying structure and some disadvantages: it’s harder to optimize graphs (but still possible) and it’s also harder to schedule data transfer.
  • Kyunghyun Cho – “Neural Machine Translation and Beyond”: showed why sentence-level and word-level machine translation is not desired: (1) it’s inefficient to handle various morphological words variants, (2) we need good tokenisation for every language (not that easy), (3) they are not able to handle typos or spelling errors. Therefore, character-level translation is what we need because it’s more robust to errors and handles better rare tokens (which are actually not necessarily rare).

A Concentric-based Approach to Represent Topics in Tweets and News

[This post is based on the BSc. Thesis of Enya Nieland and the BSc. Thesis of Quinten van Langen (Information Science Track)]

The Web is a rich source of information that presents events, facts and their evolution across time. People mainly follow events through news articles or through social media, such as Twitter. The main goal of the two bachelor projects was to see whether topics in news articles or tweets can be represented in a concentric model where the main concepts describing the topic are placed in a “core”, and the concepts less relevant are placed in a “crust”. In order to answer to this question, Enya and Quinten addressed the research conducted by José Luis Redondo García et al. in the paper “The Concentric Nature of News Semantic Snapshots”.

Enya focused on the tweets dataset and her results show that the approach presented in the aforementioned paper does not work well for tweets. The model had a precision score of only 0.56. After a data inspection, Enya concluded that the high amount of redundant information found in tweets, make them difficult to summarise and identify the most relevant concepts. Thus, after applying stemming and lemmatisation techniques, data cleaning and similarity scores together with various relevance thresholds, she improved the precision to 0.97.

Quinten focused on topics published in news articles. When applying the method described in the reference article, Quinten concluded that relevant entities from news articles can be indeed identified. However, his focus was also to identify the most relevant events that are mentioned when talking about a topic. As an addition, he calculated a term frequency inverse document frequency (TF-IDF) score and an event-relation (temporal relations and event-related concepts) score for each topic. These combined scores determines the new relevance score of the entities mentioned in a news article. The improvements made improved the ranking of the events, but did not improve the ranking of the other concepts, such as places or actors.

Following, you can check the final presentations that the students gave to present their work:

A Concentric-based Approach to Represent News Topics in Tweets
Enya Nieland, June 21st 2017

The Relevance of Events in News Articles
Quentin van Langen, June 21st 2017

Collective Intelligence 2017 – Trip Report

On June 15-16 the Collective Intelligence conference took place at New York University. The CrowdTruth team was present with Lora Aroyo, Chris Welty and Benjamin Timmermans. Together with Anca Dumitrache and Oana Inel we published a total of six papers at the conference.


The first keynote was presented by Geoff Mulgan, CEO of NESTA. He set the context of the conference by stating that there is a problem with technological development, namely that it only takes knowledge out of society and does not put it back in. Also, he made it clear that many of the tools we see today like Google Maps are actually nothing more than companies that were bought and merged together. This combination of things is what creates the power. He also defined what the biggest trends are in collective intelligence: the observation e.g. citizen generated data on floods, predictive models e.g. fighting fires with data, memory e.g. what works centers on crime reduction, and judgement e.g. adaptive learning tool for schools. Though, there are a few issues with collective intelligence: Who pays for all of this? What skills are needed for CI? What are the design principles of CI? What are the centers of expertise? These are all not yet clear. However, what is clear is that there is a new field emerging through combining AI with CI: Intelligence Design. We used to think systems resolve this intelligence, but actually we need to steer and design it.

In a plenary session there was an interesting talk on public innovation by Thomas Kalil. He defined the value of concreteness as things that happen when particular people or organisations take some action in pursuit of a goal. These actions are more likely to affect change if you can articulate who would needs to do what. He said he would like to identify the current barriers to prediction markets and areas where governments could be a user and funder of collective intelligence. This can be achieved through connecting people that are working to solve similar problems locally, e.g. in local education. Then change can be driven realistically, by making clear who needs to do what. Though, it was noted also that people need to be willing and able for change to work.

Parallel Sessions

There were several interesting talks during the parallel sessions. Thomas Malone spoke about using contest webs to address the problem of global climate change. He claims that funding science can be both straightforward and challenging, for instance government policy does not always correctly address the need of a domain issues, and even conflicts of interest may exist. Also, fundamental research can be tough to convince the general public of its use, as it is not sexy. Digital entrepreneurship is furthermore something that is often overlooked. There are hard problems, and there are new ways of solving them. It is essential now to split the problems up into parts, solve each of them with AI, and combine them back together.

Chris Welty presented our work on Crowdsourcing Ambiguity Aware Ground Truth at Collective Intelligence 2017.

Also Mark Whiting presented his work on Daemo, a new crowdsourcing platform that has a self-governing marketplace. He stress the fact that crowdsourcing platforms are notoriously disconnected from user interests. His new platform has a user driven design, in order to get rid of the flaws that exist in for instance Amazon Mechanical Turk.

Plenary Talks

Daniel Weld from the University of Washington presented his work on argumentation support in crowdsourcing. Their work uses argumentation support in crowd tasks to allow workers to reconsider their answers based on the argumentation of others. They found this to significantly increase the annotation quality of the crowd. He also claimed that humans will always need to stay in the loop of machine intelligence, for instance to define what the crowd should work on. Through this, hybrid human-machine systems are predicted to become very powerful.

Hila Lifshitz-Assaf of NYU Stern School of Business gave an interesting talk on changing innovation processes. The process of innovation has changed from a lane inventor, to labs, to collaborative networks, and now into open innovation platforms. The main issue with this is that the best practices of innovation fail in the new environment. In standard research and development there is a clearly defined and selectively permeable, whereas with open innovation platforms this is not the case. Experts can participate from in and outside the organisation. It is like open innovation: managing undefined and constantly changing knowledge in which anyone can participate. For this to work, you have to change from being a problem solve to a solution seeker. It is a shift from thinking: The lab is my world, to the world is my lab. Still, problem formulation is key as you need to define the problems in ways that cross boundaries. The question always remains, what is really the problem?

Poster Sessions

In the poster sessions there were several interesting works presented, for instance work on real-time synchronous crowdsourcing using “human swarms” by Louis Rosenberg. Their work allows people to change their answers through the influence of the rest of the swarm of people. Another interesting poster was by Jie Ren of Fordham University, who presented a method for comparing the divergent thinking and creative performance of crowds compared to experts. We ourselves had a total of five posters covering both poster sessions, which were received well by the audience.

Collective Intelligence Slides and Papers

Here is the full list of our papers published at the Collective Intelligence 2017 conference:

Chris Welty also presented our work on Crowdsourcing Ambiguity Aware Ground Truth at Collective Intelligence. The slides are available here:

ESWC 2017 – Trip Report

Between 28th of May and 1st of June 2016 the 14th Extended Semantic Web Conference took place in Portorož, Slovenia. As part of the CrowdTruth team and project, Oana Inel presented her paper written together with Lora Aroyo in the first day of the conference. More about the paper that was presented can be found in a previous post. In the last day of the conference, Lora was the keynote speaker.

The Semantic Web group at the Vrije Universiteit Amsterdam had other great presentations. During the Scientometrics Workshop Al Idrissou talked about the SMS platform that links and enriches data for studying science. During the poster and demo session people were invited to check SPARQL2Git: Transparent SPARQL and Linked Data API Curation via Git by Albert Meroño-Peñuela and Rinke Hoekstra. Furthermore, the Semantic Web group had a candidate paper for the 7-year impact award “OWL reasoning with WebPIE: calculating the closure of 100 billion triples”, by Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen and Henri Bal.


I’ll start by writing a couple of words about the keynotes, which covered this year a high range of areas, domains and subjects. In the first keynote presentation at ESWC 2017, on Tuesday, Kevin Crosby, from RavenPack, stressed the importance of data as a factor in decision making for financial markets. In his talk entitled “Bringing semantic intelligence to financial markets”, he focused on the current issues related to data analytics in decision making: the lack of skills and expertise, the quality and completeness of data and the timeliness of data. However, the most important issue is the fact that although we live in the age of data, only around 29% of the decisions in the financial market are made based on data.

The second keynote speaker was John Sheridan, the digital director of The National Archives in UK. While giving a nice overview of the British history, he talked about how semantic technologies are used to preserve the history at The National Archives in UK, in a talk entitled “Semantic Web technologies for Digital Archives”. Nowadays, semantic technologies are used at large in order to make the cultural heritage collections publicly available online. However, people still struggle to search and browse through archives without having the context of the data. As a take home message, we need to work towards the second generation digital archives that should measure risks, provide trust evidence, redefine context, embrace uncertainty, enable use and access.

In the last day of the conference Lora Aroyo gave her keynote presentation, “Disrupting the Semantic Comfort Zone”. Lora started her keynote by looking back into the history of Semantic Web and AI and how her own journey embraced the changes along the way. Something was clear: the humans were always in the centre and they still continue to be. The second part of the presentation focused on introducing the underlying idea of the CrowdTruth project. As a final note, I’ll leave here the following question from Lora: “Will the next AI winter be the winter of human intelligence or not?”

NLP & ML Tracks

Federico Bianchi presented during the ML track an approach that uses active learning to rank semantic associations. The problem is well-known, we have an information overload in contextual KB exploration and even for small amounts of texts there is a lot of data to be considered. In order to determine which semantic associations are most interesting to users, Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs defines a ranking function based on a serendipity heuristic, i.e., relevance and unexpectedness.

The paper “All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking” by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo draws the attention over the current gold standards and makes similar claims as the ones we presented in our paper: the gold standards for not share a common set of rules for annotating named entities, they are not thoroughly checked and they are not refined and updated to newer versions. Thus, the need for the EAGLET benchmark curation tool for named entities!

Using semantic annotations for providing a better access to scientific publications is a subject that nowadays caught the attention of many researchers. Sepideh Mesbah, PhD student at Delft University of Technology presented “Semantic Annotation of Data Processing Pipelines in Scientific Publications”, a paper that proposes an approach and workflow for extracting semantically rich metadata from scientific publications, by classifying the content of scientific publications and extracting the named entities (objectives, datasets, methods, software, results).

Jose G. Moreno presented the paper “Combining Word and Entity Embeddings for Entity Linking” which introduces a natural idea for entity linking by using a combination of entity and word embeddings. The claims of the authors are the following: you shall know a word by the company it keeps and you shall know an entity by the company it keeps in a KB, word context by alignment, word/entity context by concatenation.

Social Media Track

The Social Media track started with a presentation by Hassan Saif – “A Semantic Graph-based Approach for Radicalisation Detection on Social Media”. The approach presented in the paper uses semantic graph representation in order to discover patterns among pro and anti ISIS users on social media. Overall, pro-ISIS users tend to discuss about religion, historical events and ethnicity, while anti-ISIS users focus more on politics, geographical locations and intervention against ISIS. The second presentation – “Crowdsourced Affinity: A Matter of Fact or Experience” by Chun Lu – took us in a different domain – a travel destination recommendation scenario that is based on a user-entity affinity, i.e., the likelihood of a user to be attracted by an entity (book film, artist) or to perform an ection (click, purchase, like, share). The main finding of the paper was that in general, a knowledge graph helps to assess more accurately the affinity, while a folksonomy helps to increase its diversity and novelty. The Social Media Track had two papers nominated for best student research paper – the aforementioned paper and the paper “Linked Data Notifications” presented by Sarven Capadisli, Amy Guy, Christoph Lange, Sören Auer, Andrei Sambra and Tim Berners-Lee. The latter was also the winner!

In-Use and Industrial Track

Social media was highly relevant for the In-Use track as well. The Swiss Armed Forces is developing a Social Media Analysis system aiming to detect events such as natural disasters and terrorists activity by performing semantic tweet analysis. If you want to know more, you can the paper “ArmaTweet: Detecting Events by Semantic Tweet Analysis”. This track has as well nominations for best in-use paper. The winning paper in this category was “smartAPI: Towards a More Intelligent Network of Web APIs”, presented by Amrapali Zaveri.

Open Knowledge Extraction Challenge

During the Open Knowledge Extraction challenge, Raphaël Troncy presented the participating system ADEL – an adaptable entity extraction and linking framework, also the challenge winning entry. The ADEL framework can be adapted to a variety of different generic or specific entity types that need to be extracted, as well as to different knowledge bases to be disambiguated to, such as DBpedia and MusicBrainz). Overall, this self-configurable system tries to solve a difficult problem with current NER tools, i.e., the fact that they are only tailored for specific data, scenarios and applications.


On Monday, during the second day of workshops I attended two workshops, 3rd international workshop on Semantic Web for Scientific Heritage, SW4SH 2017 and Semantic Deep Learning, SemDeep-17, now at the first edition. During the SW4SH 2017 workshop, Francesco Beretta had a detailed keynote, entitled “Collaboratively Producing Interoperable Ontologies and Semantically Annotated Corpora” in which he presented a couple of projects for digital humanities (symogih.org, the corpus analysis environment TXM, among others) and how linked (open) data, ontologies, automated tools for natural language processing and semantics are finding their place in the daily projects of humanities scholars. However, all these tools, approaches and technologies are not 100% embraced, as humanities scholars are seldom content with precision values of 90% and they feel the urge of manually tweak the data, until it looks perfect.

During SemDeep-17, Sergio Oramas presented the paper “ELMDist: A vector space model with words and MusicBrainz entities”. This article makes it clear that it’s still unclear how NLP and semantic technologies can contribute in Music Information Retrieval areas such as music and artist recommendation and similarity. The approach presented uses NLP processing in order to disambiguate the entities from the musical texts and then runs the word2vec algorithm over this sense level space. Overall, their results show promising results, meaning that textual descriptions can be used in order to improve the Music Information Retrieval area. The last paper of the workshop, “On Semantics and Deep Learning for Event Detection in Crisis Situations”, was presented by Hassan Saif. As the title suggests, the paper tries to solve the problem of event detection in crisis situations from social media, using Dual-CNN, a semantically-enhanceddeep learning model. Altought the model has successful results in identifying the existence of events and their types, its performance drops significantly when identifying event-related information such as the number of people affected, total damages.