Prof. dr. C.J. (Kees) van Deemter

Prof. dr. C.J. (Kees) van Deemter

Natural Language Processing

Note: Since I've retired to become an emiritus at the UU in March 2024, I'm no longer available as the main supervisor for any of the projects below. I leave these project ideas here for a while, just  in case someone else might want to pursue them. (In this case please be aware that some of these projects may have been carried out already, so please check.)


The following items are indications of project ideas I am interested in. These ideas could be addressed as part of a Master or Bachelor thesis project. Each idea is open to modification: the actual project plan will be worked out in conversation with the student, to fit your skills and interests. 


Project: Estimating the difficulty of a formula of First Order Predicate Logic. [Type of project: designing an experiment with human participants and analysing the results statistically.] In previous work, algorithms have been proposed that estimate how difficult a given formula of First Order Predicate Logic is for human logic learners. We want to subject these algorithms to a new empirical test, by means of a controlled experiment. Ultimately we want to use the results of this experiment to design improved algorithms (but that may have to wait until a new project).

Project: Expressing propositional logic formulas in a natural language. [Type of project: A good understanding of First Order Predicate Logic is required. Good programming skills are desirable. Knowledge of Natural Language Generation would be a plus.] In previous work at UU, a Natural Language Generation system has been designed that takes formulas of First Order Predicate Logic (FOPL) as input, and delivers as output English sentences that express these same information as the formula. This system needs to be modified in two ways: (1) The existing system only works for FOPL formulas without equality (=); the changes is to extend the system in such a way that it can handle formulas with equality as well. (2) The existing system does not take background knowledge into account (for example, the fact that an object can not be at two places at the same time); the challenge is to build a system that takes background information into account.

Project: Analysing a corpus of quantified expressions. [Type of project: Analysing data from a previous experiment in which human speakers were asked to descibe visual scenes. Experience with statistical analyses would be desirable.] In previous work at UU, we have conducted “elicitation” experiments in which speakers of English were asked to describe simple geometrical scenes; the experiment was set up in such a way that quantifiers (words such such as “all”, “many”, “most”, “three”) had to be used a lot. The resulting corpus of quantified descriptions cries out for being analysed and compared to other corpora. We want to understand what types of quantifiers speakers use (e.g., How often do they use vague quantifiers, like "many"?), and how they decide when they've said enough.

Project: Simplifying formulas of First Order Predicate Logic (FOPL). [Type of project: Computational logic.] Given a formula p of FOPL, what is the "simplest" formula q such that p and q are logically equivalent? Different versions of this project idea arise, depending on how we define the word "simple". For example, we may define "simple" in terms of the length of the formula, for example as measured by the number of logical opertors in it. (Note that the question of whether two such formulas p and q are logically equivalent is, in general, undecidable.)


Project: Explicitness of Rhetorical Relations in Mandarin. [Type of project: Corpus analysis and/or design and analysis of a controlled experiment.] In human languages, rhetorical relations (also called discourse relations) can be expressed explicitly, for instance by means of words like "because", "therefore", "although". They can also be left implicit, as when we say "You send him a message, he always responds immediately", where a conditional relation between the two clauses is left implicit ("If you send him a message, then ..."). It has been hypothesised that rhetorical relations in Mandarin are left implicit more often than those in English or Dutch. We want to investigate whether this hypothesis is true. In a follow-up project, we may want to computationally model the behaviour of Mandarin in this regard.

Project: Understanding definiteness in Mandarin. [Type of project: Setting up and analysing an experiment with human participants. Focus is on linguistics and experimentation.] In previous work at UU, we have conducted an experiment in which speakers of English were asked to single out an object or individual in a visual scene, by uttering descriptions such as “The man with a grey moustache”, for instance. We also did a similar experiment with speakers of Mandarin, and this led to utterances that could be translated into English as either a definite or indefinite description, for instance “*The* man with a grey moustache” or “*A* man with a grey moustache”, or even “Men with grey moustaches”. We want to do an experiment that seeks to find out how often native speakers of Mandarin are nonetheless able to figure out whether the description in question is definite or indefinite, and whether the referent is singular or plural.


Project: Formalising Graeme Ritchie’s framework for the analysis of jokes. [Type of project: A challenging project that requires strong analytical skills. Experience with web technology would be a plus.] Greame Ritchie’s latest book, "The Comprehension of Jokes: A Cognitive Science Framework" (Routledge 2018), offers perhaps the most precise framework for analysing jokes that is currently available. The challenge in the present project is to work towards making this framework even more precise, and applicable to the analysis of real jokes. The ultimate plan (which I do not expect to be completed within one project) is to (1) distill from Ritchie’s framework a part, henceforth called the Core Framework, that we understand well and that is small enough to be manageable but large enough to be interesting, (2) design a formal language that is suitable for analysing and annotating jokes using that Core Framework, (3) mapping the expressions in this formal language to diagrams that can be shown on a web page, and (4) trying out this formal language (and these diagrams) on a small corpus of jokes, to see to what extent the new formal language is able to capture what goes on in each joke. 


Project: Strengths and weaknesses of deep learning. [Type of project: although this project may not require the writing of even a single line of code, it can only be done well by someone who has a good understanding of Deep Learning, such as can only be acquired from working with neural NLP models.] Neural models are "hot" in many areas of AI, perhaps nowhere more so than in NLP. Nonetheless, some authors have argued that the way in which neural models are currently used in NLP has important disadvantages (see for example this paper). I'd be interested in (co-)supervising a project whose aim it is to assess one or more of the arguments that have been put forward in this area and, if time and expertise allow it, chart the way forward. 


Project: Annotating Referring Expressions. [Type of project: Corpus annotation. We need a linguist who likes appplying formal theories to language data.] Guanyi Chen and I have designed a formally precise annotation schema for distinguishing between different kinds of Referring Expressions (more specifically: different kinds and degrees of under- and over-specification). We're looking for someone to let human annotators apply this schema to an actual corpus. As a part of this project, we'd like the student to adapt our schema to the corpus, to design an annotation manual, to try out this manual with human annotators, and to test inter-annotator agreement.