Prof. dr. C.J. (Kees) van Deemter

Prof. dr. C.J. (Kees) van Deemter

Natural Language Processing

        Symposium "Generative AI: Science or Engineering?”
        Utrecht University, Academiegebouw, 
        Belle van Zuylenzaal (talks) and Westerdijkzaal (food, drinks, reception).
        Friday 5 July 2024 (all day)

This symposium, which takes place on the occasion of my retirement,  is organised by the Department of Information and Computing Sciences, in collaboration with the Descartes Centre. The event has been sponsored by those same two departments, by the AiNed project “Dealing with meaning variation in NLP”, and by the UU focus area Human-Centered AI. 

For questions about the programme, please email me at

The symposium will explore methodological issues surrounding Deep Learning in Natural Language Processing:

1. METHODOLOGICAL VALIDITY How should the scientific use of Deep Learning in NLP, including Large Language Models such as ChatGPT, be understood? Can the way in which these models are used in academic research be justified from the point of view of the philosophy of science? Is the current wave of research into “explainable” models going to resolve these issues? Are there other NLP methodologies that are more consistent with time-honoured conceptions of scientific method, which have tended to emphasize explanatory value?

2. LIMITATIONS Some authors have argued that there are (provable) limits to what Large Language Models can learn about language and about the world. Can cogent arguments along these lines be made, and if so, what exactly are the limits to what LLMs can learn? And while LLMs produce impeccably worded and plausible looking output, it has so far been difficult to safeguard the truthfulness, and absence of various kinds of bias, of their output. What might be promising ways to address these limitations?

3. INTERDISCIPLINARITY As long as Deep Learning models are black boxes, it is difficult for NLP to learn from, and to contribute to, such academic disciplines as theoretical linguistics, psychology, and mathematical logic, which once were closely connected with NLP. Similarly, there is now a tendency for older NLP work — which emphasized explicit rules and classic Machine Learning — to be overlooked. Is this merely a sign of healthy progress, or is it also a problem? If it is, then how can the problem be mitigated?

On the morning of 6 July, an informal satellite event, called “Computational Linguistics in the Hothouse”, will take place in the Serre of the Botanical Gardens at the UU Science Park. Details about this satellite event will be announced elsewhere.

         Provisional programme (For titles & abstracts, see below)

09.00 - 09.20:   Coffee and tea
09.20 – 09.30:   Introduction by Albert Gatt
09.30 – 10.50:  First session
  Speaker 1: Alexander Koller (Saarland, Computational Linguistics)
  Speaker 2: Dong Nguyen (Utrecht, Information & Computing Sciences)
10.50 – 11.15:  Coffee and tea
11.15 – 12.35:     Second session
  Speaker 3: Michael Franke (Tuebingen, Linguistics)
  Speaker 4: MH Tessler (Deep Mind)
12.35 – 13.45:    Lunch 
13.45 – 15.45:    Third session
  Speaker 5: Ehud Reiter (Aberdeen, Computing Science)
  Speaker 6: Denis Paperno (Utrecht, Linguistics)
  Speaker 7: Federica Russo (Utrecht, Freudenthal Inst.)
15.45 – 16.20:    Coffee and tea
16.20 – 17.00  Fourth session
  Speaker 8: Kees van Deemter
17:00 -- 18:15: Reception/Presentation by Marc van Kreveld/Drinks

         Titles and Abstracts of talks

1. Alexander Koller (Saarland, Computational Linguistics)

Title: Untrustworthy and still revolutionary: Some thoughts on how LLMs are changing NLP

There is no doubt that large language models (LLMs) are revolutionizing the field of natural language processing (NLP) in many ways. There are many doubts on whether this a good thing, whether we will ever be able to overcome their inability to reliably distinguish truth from falsehood, whether there is any place left for pre-LLM models, and how to do good science any more.

I do not have definitive answers on any of these questions, and am personally torn on many of them. In this talk, I will first discuss some recent research on the limitations of LLMs and on overcoming them through the use of neurosymbolic models, in tasks such as semantic
parsing and planning. I would then like to share some of my more general thoughts on science and engineering in NLP in the era of LLMs.

2. Dong Nguyen (Utrecht, Information & Computing Sciences)

Title: Collaborative Growth: When Large Language Models Meet Sociolinguistics

In this talk, I will explore the potential synergies between large language models (LLMs) and sociolinguistics. I will explore how LLMs can enhance sociolinguistic research and, conversely, how sociolinguistic insights can inform and improve the development of LLMs.

3. Michael Franke (Tuebingen, Linguistics)

Title: Understanding Language Models: The Japanese Room Argument

Searle’s famous Chinese Room Argument is an excellent tool for probing our intuitions about why rule-based AI systems are not felt to develop internal understanding in spite of superficially great input-output performance in language use. Building on previous related work (e.g., Bender & Koller’s Octopus Test), I aim to develop a thought experiment more parallel to Searle’s, which I call the Japanese Room Argument, to serve as a scaffolding for intuitions about whether language models, in particular autoregressive LMs trained to optimize next-token probability on massive amounts of text, generate understanding /by necessity/ if scaled in training size and model capacity to approximate perfect input-output alignment with humans.

4. MH Tessler (Deep Mind)

Title: AI can help humans find common ground in democratic deliberation.

Abstract: Finding common ground on tough issues is a challenge. Could generative AI offer a way forward? I'll discuss how large language models (LLMs) can help groups of people bridge disagreements on social and political issues. We trained an LLM-based ‘deliberative assistant’. The assistant takes in a set of human-written opinions, and its objective is to generate a statement that reflects a consensus view among the group. Human participants preferred the LLM-generated statements to statements written by humans playing the role of mediator, and rated the LLM-generated statements as more informative, clear, and logical. After critiquing these ‘group statements’, discussants tended to update their views and converge on a common position on the issue. Such convergence did not occur when discussants were simply exposed to each others’ views without deliberative assistance. Text embeddings suggested that the LLM responded to the critiques by incorporating dissenting voices while not alienating the majority. These findings highlight new opportunities for people to use LLMs as tools to help find political common ground.

5. Ehud Reiter (Aberdeen, Computing Science)

Title: Challenges in evaluating LLMs

Testing scientific hypotheses about LLMs usually requires evaluating their outputs, but there are many challenges to evaluating LLM outputs in a scientifically rigorous way.  I will discuss three such challenges:
* data contamination: a cardinal rule of ML is not to train on test data. Providing clean unseen test data for Internet-scale LLMs can be challenging.
* reproducibility: scientific experiments must be reproducible, but this is difficult with closed LLMs in particular because they are constantly being  updated;  hence an experiment done today may give different results tomorrow.
* subtle errors: our work on using LLMs in patient-facing applications has shown many subtle-but-important problems in LLM outputs, which are not easy to detect

6. Denis Paperno (Utrecht, Linguistics)

Transformers as complex rule-based models for language

In this position talk, I argue for viewing neural models as (developing into) an adequate next generation of tools for theoretical understanding of language. Against the backdrop of the golden age of formal linguistics in the second half of the 20th century, current neural approaches seem disruptive and unhelpful. However, in the broader context of the history of linguistics, neural models are easily (developing into) the next logical step in the evolution of tools for linguistic analysis. I will argue along the following lines:

        (1) Historically, much of traditional grammar was concerned with interpreting and classifying linguistic phenomena without predictive goals.

        (2) With the advent of structuralism and generativism, linguistics focused on giving linguistic theory predictive rather than only interpretive power.
              - in formal approaches, linguistic theorizing and description works with rules, often over feature structures
              - rules are a vehicle for generalization and prediction
              - in specific applications, manually written rules can work no worse or better than systems based on machine learning.

        (3) Still, the complexity of language is an objective challenge to rule based approaches.
              - often, complexity makes applied rule systems unenlightening even if effective
              - attempts at more enlightening rule systems lead to postulating controversially abstract features and structures
              - predictiveness of rule based systems is limited because they are difficult to design beyond narrowly defined phenomena
              - inducing rules from data is more of an art of a grammarian than a scientific procedure that can be automated
              - for some phenomena, simple rules have been argued to be empirically inadequate.

        (4) Neural models, in particular Transformers, offer a promise of solving to these limitations.
              - a Transformer is as a complex system of rules over feature vectors
              - each Transformer sublayer operation such as Attention or Feedforward operation can be seen as a rule
              - rules and their interactions are learned from data and do not need to be designed manually
              - complexity of models can capture the complexity of language as rules for multiple phenomena are learned at once
              - by nature, Transformer operations are not immediately interpretable, but developments in interpretability research are promising.

        (5) Probing research on trained neural networks is important for linguistic theory.
              - presence of linguistic features and structures have been established in neural NLP systems
              - with further progress in interpretability, we will be able to operate with trained neural models as explicit representations of rule systems
              - open topics include invariants vs competing theories in trained neural models
              - hidden embedding features may provide objective evidence of abstract structure
              - new methods in linguistic typology are on the horizon.
6. Federica Russo (Utrecht, Freudenthal Inst.)

Validity and Explainability in the era of Generative AI: A Philosophy of Science perspective

Lately, ‘validity' and ‘explainability' have attracted a lot of attention in the context of AI and of Generative AI. The terms have also acquired specific meaning, sometimes at variance with a ‘more classic’ use of these terms, due to Philosophy of Science debates, and that investigated extensively methods across the natural, biomedical, and social sciences.
In this talk, I will briefly revisit some of the main uses of ‘validity’ and ‘explainability’ as they are customarily used in Philosophy of Science, to see whether we can capitalise on them, while building a solid epistemology of Generative AI.

7. Kees van Deemter (Utrecht, Information & Computing Sciences)

Title: Things We Once Believed

In this talk I will cast my mind back to the year 1984, when I graduated in Logic at the University of Amsterdam and started my work in NLP at Phillips Nat Lab (Institute for Perception Research). I will list some propositions about NLP and AI that were widely believed at the time, at least in my part of the academic-industrial forest. Regarding each of these propositions, I will ask how it should be assessed 40 years later, in 2024.