Dealing with Meaning Variation in NLP

On the 1st of May 2023 Massimo Poesio started in the position of Professor of Natural Language Understanding at the Faculty of Science’s Department of Information and Computing Sciences. He is a cognitive scientist with a focus on computational linguistics and natural language processing, and has an extensive track record of interdisciplinary collaboration with theoretical linguists, psychologists, and neuroscientists. His research in Utrecht will focus on the field of natural language understanding, in close collaboration with the Faculty of Humanities. In the run up to the position, Poesio has been awarded the very first AiNed Fellowship Grant. He will use the grant for a research project called ‘Dealing with meaning variation in Natural Language Processing’

More information, visit his website.

Meaning Variation 

The meaning of natural language expressions varies, sometimes dramatically, along a dimensions including subjective bias (e.g., what’s funny / offensive for one person may not be for another; Akhtar et al, 2021; Almanea & Poesio, 2022; Kocon et al, 2021; Leonardelli et al, 2021) ambiguity (e.g., the question of what a pronoun like “he” refers to in a given context, Poesio & Artstein, 2005; Versley, 2008; Recasens et al, 2011; Passonneau et al, 2012; Plank et al, 2014; Pavlick & Kwiatkowski, 2019), and vagueness (e.g., what data dimensions and thresholds do we apply when we call the weather “mild”, or the condition of a patient “stable”? Van Deemter 2010, Douven et al. 2013).  

Variations in meaning raise serious challenges for NLP. These challenges arise from a scientific/technological point of view (e.g., How can systems learn how to interpret particular language expressions? How can these interpretations be evaluated?) and from an application point of view (e.g., What should a social media company do with a post that is offensive according to some people, but not according to others? How should a robot recognise when its interpretations are precise enough?). Industry and the scientific community have now recognized the challenge and started to study the problem (Poesio & Artstein, 2005; Plank et al, 2014; Aroyo & Welty, 2015; Akhtar et al, 2020; Uma et al, 2021b), often in projects led by Prof. Poesio and his team and/or his collaborators. Nonetheless, most of the fundamental questions still need to be addressed.  

Objectives and sub-projects. The objective of this project is to carry out fundamental as well as applicable research on meaning variation in NLP along several dimensions of variation, exploring the interconnections between them and the implications for NLP research and applications. Two of the projects (P1, P2) carry out foundational research on theoretical linguistic theories and statistical tools for analysing variation; two projects (P3, P4) carry out in-depth empirical/computational research into areas of NLP in which variation has been shown to be prevalent, but which have so far resisted analysis using existing mathematical and computational models; the two remaining projects (P5, P6) look at how variation emerges along a temporal dimension, focusing on dialogue. These six projects will thus investigate closely related themes, allowing the researchers working on them to learn from each other and to closely collaborate in an interdisciplinary team.
 

AINed – The Projects

AINed  is articulated around six interrelated PhD  projects grouped in three clusters with staged starting points. The two Theoretical Foundations projects will start first on 1/5/2023, followed by the two Empirical Analysis projects six months later, followed by the two Explicit Negotation projects one year later.  

Project 1: Theoretical Foundations 1 - Formal Semantics for Vagueness in Interpretation.
This project is concerned with developing mathematical and computational models of  the uncertainty arising from vagueness and testing them on large scale data.
This PhD project will be a collaboration between Computing Science (profs. van Deemter, prof. Poesio) and Linguistics (dr. Nouwen). 

Project 2: Theoretical foundations 2 - Learning under disagreements between annotatorsThis project will investigate  whether the differences between various sources of disagreement (e.g., noise, ambiguity, subjective bias) can be detected using statistical models, and how to use  such insight to guide the development of approaches for training and evaluating NLP models with datasets containing disagreements.
This project will be a collaboration between Computing Science (Prof. Gatt, prof. Poesio) and Linguistics (dr. Paperno). 

Project 3: Empirical analysis of variation 1 -  Variation in coreference and reference
Computational models of referring expression interpretation that can learn from datasets with disagreement do not yet exist.  The objective of this project is to develop such models, as well as  metrics that do justice to interpretative variation.
This PhD project will be a collaboration between computing scientists (Prof. Gatt, prof. Poesio)  linguistics  (prof. Winter) and brain science.

Project 4: Empirical analysis of variation - Subjectivity and offensive language detection.
The project will develop models for detecting offensive language that take into account the fact that the offensiveness of some content can be controversial. It will use accuracy metrics that take different interpretations for a potential offensive expression into account (e.g., those developed in P2). One of the objectives of the project is to work on Dutch.
This is a Computational Social Science project, involving Computing Science (dr. Nguyen, prof. Poesio) and external partners.

Project 5: Explicit negotiation 1 -  Conflicting interpretations in dialogue
The project will be concerned with the differences in interpretation that arise from misunderstandings in dialogue, focusing in particular in misunderstandings in coreference and reference. It will develop methods to annotate such misunderstandings in data, and to develop conversational agents able to identify such misunderstandings and repair them.
This project lies at the intersection between conversation analysis, NLP (coreference, reference), and conversational agents. It will be a collaboration between Computing Science (dr. Nguyen, prof. Poesio) and Linguistics (prof. Sanders)