Dr. Pablo Mosteiro Romero

Dr. Pablo Mosteiro Romero

Assistant Professor
Methodology and Statistics
p.j.mosteiroromero@uu.nl

Research area 

My main research area is Natural Language Processing (NLP). My current research interests are in language change and information-theoretic approaches to linguistics. On the language-change front, I am studying the evolution of synonyms across time in multiple languages. At the same time, I am using methods from information theory to study morphology and syntax and how they trade off in many languages from all continents. In the past, I have worked extensively on clinical applications of NLP, mostly in psychiatry, and fairness and explainability in NLP and multimodal systems. I am also interested in experimental reproducibility and data quality. 

I am a part of the Sector Plan for Social and Behavioral Sciences, under the theme The Human Factor in New Technologies.

Research skills 

I hold bachelor’s and PhD degrees in Physics, with a focus on experimental particle physics. My main skills in that area are the development of computational tools for experimental control, data acquisition, and data analysis, as well as in mathematics, specifically those required for information theory, which relate to statistical physics. My programming skills extend to my present work, where my skills include software development and machine learning model development and training. I am also skilled in data annotation and statistical analyses. 

Projects
General project description

Historical linguists have long sought to understand how language evolves over time, leading to the formulation of various laws, including contradictory ones governing the evolution of synonymy. While recent computational work has attempted to evaluate these laws, limitations in methodologies and data have resulted in inconclusive and conflicting findings. To address these challenges, this project leverages advancements in natural language processing (NLP) to track changes in usage of individual word senses over time, and to thereby assess the validity of two long-standing linguistic laws governing the evolution of synonymy.

Role
Project Leader
Funding
NWO grant
General project description

This research project aims to build upon and refine the findings of the paper "On the Usefulness of Comparable and Parallel Corpora for Contrastive Linguistics. Testing the Semantic Stability Hypothesis"1 by critically examining and augmenting its statistical methods, evaluating its methodology on quasi-parallel texts without translations, and potentially extending the analysis to include machine-generated texts.

Role
Project Leader
Funding
Other
Project members UU
Project
What is a word? What was a word? 29.11.2024 to 29.11.2025
General project description

The concept of word is indispensable in the study of language while its theoretical status and even its objective reality is contested. This study aims to explore the concept of word as a fundamental unit through the statistical trade-off between morphology and syntax. Building on existing methodologies1, we will investigate this trade-off across different stages of a language's evolution to understand the informational optimality of words. We will rely on replicating and extending a previous study, combining it with an approach developed by one of the applicants that explores the effect of word-boundary manipulations on the trade-off between word order and word structure. Finally, we will evaluate diachronic case studies. Our data starts with the Parallel Bible Corpus, but we will also explore other corpora that can provide more diachronic information. This work will teach us about the information-optimality of words and will also give us insights into historical language change, shedding light on wordhood from a quantitative perspective.

Role
Project Leader & Researcher & Contact
Funding
Other Applied Data Science research grant
Project members UU
Completed Projects
Project
Assessing Reliability of Annotations in the Context of Model Predictions and Explanations 21.12.2023 to 21.03.2025
General project description

With the rise of machine learning models in sensitive areas, such as sexism detection on social media platforms, the accuracy of these models is of paramount importance. There are many ongoing research and evaluation campaigns in this field, like EXIST and EDOS. For this task, it is important not only the accurate predictions of the model but also to generate explanations for those predictions. Because most datasets that are used in the studies have been annotated by humans, it is important to understand the factors that can influence them. Therefore, assessing the reliability of annotations made by humans becomes crucial to ensure the quality of the validation process. In this project, we aim to measure the influence of explanations generated by prediction systems on annotators' agreement and compare them with model predictions. Our innovation is about using explanation techniques to better understand both model and human reliability.

Role
Project Leader
Funding
Other Applied Data Science research grant