Data Science in Utrecht: introducing a new community

Data Science is intertwined with nearly all life science research in Utrecht. From various perspectives, Utrecht researchers are working on a better understanding of the availability, usability, and application domains of data. They come together in the thematic community of Data Science and Cohorts. Just before the first meeting of the community board, we speak with Professor Carl Moons and Associate Professor Miel Hostens, chair and vice-chair of the newly founded community, about their mutual ambitions.

Professor Carl Moons

The conversation is exploratory in nature. "The members of this community board haven't spoken to each other before, we are going in with an open mind," begins Moons, who has a background in epidemiology and methodology and data science in medical research. Hostens, who describes himself as 'an odd one out', comes from the veterinary medicine corner: "My entire lab does nothing but create data-driven solutions for agriculture."

Both agree that data science is a unifying factor among various scientific disciplines. As Moons aptly puts it: "Almost all scientific disciplines are engaged with data. That makes data science a real connector."

One of the challenges in data science is the management and use of data. This revolves around FAIR data, which stands for Findable, Accessible, Interoperable, and Reusable. Hostens explains: “If data scientists are experts in the field of data structures and FAIR data, then the challenge is to involve everyone in this.” Different clusters around data science arise bottom-up, from faculties. Moons sees an analogy with the Utrecht AI labs (including five focused on healthcare) that have been recently set up around specific themes: "The deliverables are not exactly the same, but there we also make the kind of connection that will further shape data science."

Reflecting on the role of data scientists, Hostens says, "In the past, you brought a statistician and someone with a dataset together. Nowadays, those two together are called a data scientist." Moons adds, pointing to the broad application areas of data science and the convergence of various experts within their community in the Utrecht Life Science domain.

Applied and Fundamental

Associate Professor Miel Hostens

“The applied side of data science is evident in much research,” says Hostens. “The more fundamental side, which goes beyond the methods and techniques for creating AI algorithms and where questions about ethics, provenance, and governance of data and the jurisprudence around it are discussed, is often less visible,” Moons adds. With the establishment of this community, Moons and Hostens hope to bridge the gap between fundamental and applied research even further and to harness the power of joint expertise.

Cohorts

Regarding the term Cohorts in the title of the community, there is no consensus yet. "These are groups of people with a common denominator that makes them comparable, and they are often followed over time. For example, they have a certain disease, live in a certain neighborhood, or belong to a certain age category. This is a research tool, but what's emerging today are the less controlled ‘Real World data’ and as a result ‘Real World Evidence’. These are not cohorts, but an increasingly important source for data science, also in the life science domain. One of the surprising benefits of advanced AI is the ability to find patterns in unstructured and unexpected data sources, such as patient forums. In traditional research approaches, such forums were often overlooked or considered anecdotal. With AI, however, researchers can also gain valuable insights from these data, such as identifying unreported side effects of drugs, implants, or other interventions. Using AI on such data sources can provide a richer, more nuanced picture of patient and citizen experiences and outcomes.

Autonomous Understanding of Data

While AI can perform advanced analyses, it still requires human input, especially in preparing and structuring data. Making data FAIR requires a human touch. “Human input not only helps organize the data but also provides context and interpretation to the developed AI algorithms, which would otherwise be really missed or lead to wrong conclusions and applications,” says Moons.

The future of data science is clear

With 13 AI labs focusing on Utrecht research areas, five specifically tailored to Life Sciences and health(care), Utrecht is at the forefront of data innovation. "We have chosen clear themes, areas where we excel, and that's what we want to build our community around," Hostens says.

Although the role of the data scientist is becoming increasingly important, there are still uncertainties, such as the exact definition of their profile and training needs. "The future of data science is clear, but the allocation of resources can be better," Hostens notes. Moons reflects: "In the past, statisticians, epidemiologists, and later bioinformaticians were bridge builders in the life sciences. Now data scientists are joining them to fulfill this crucial role, both at the fundamental and applied level."
In conclusion, the formation of this community is not only desired but essential for the future of data-driven science in Utrecht.