PhD defence: Say the Same but Differently: Computational Approaches to Stylistic Variation and Paraphrasing
PLEASE NOTE: If a candidate gives a layman's talk, the livestream will start fifteen minutes earlier.
“Ik ben een Utrechter” and “Ik ben een Utrechtenaar” are two Dutch sentences. They both might translate to “I am an Utrecht resident” using a tool like Google Translate. However, the choice of word matters. “Utrechter” is the more common modern term, while “Utrechtenaar” is the historic standard term for an Utrecht resident. In the 1730s, during a wave of prosecutions of gay men starting in Utrecht, “Utrechtenaar” became closely associated with homosexuality. That history still lingers. Today, when someone calls themselves an “Utrechtenaar” rather than an “Utrechter”, we might know more about them — for example, that they are more likely part of the local queer community. Language technology like Google Translate, however, can lose this nuance.
This dissertation explores how language technology can better handle variation in language. I show that both people and language models struggle to recognize different ways of saying the same thing in the context of conversations. I also find that while internal representations of language models represent content, they often do not capture differences in linguistic style. To address this, I created a new model that recognizes variation – and that is already being used by researchers and practitioners. Finally, I demonstrate that language variation is relevant at every stage of language model design, including the basic building blocks such as tokenizers.
Overall, my work encourages the field of natural language processing to consider language variation more rigorously in the development of language technology.
- Start date and time
- End date and time
- Location
- Hybride: online (livestream link) and for invited guests in the Utrecht University Hall, Domplein 29
- PhD candidate
- A.M. Wegman
- Dissertation
- Say the Same but Differently: Computational Approaches to Stylistic Variation and Paraphrasing
- PhD supervisor(s)
- prof. dr. C.J. van Deemter
- Co-supervisor(s)
- dr. D.P. Nguyen
- More information
- Full text via Utrecht University Repository