AI professor Sanne Abeln: “Predictions are improving, but genuine understanding is only growing slowly”
Researcher wants to return to the cause
Professor of AI Technology for Life Sanne Abeln offers insights into the latest developments in her field, the use of AI in the life sciences. She also explains why she believes it is important to place more focus on better understanding how AI models make their predictions, and expresses concern over reliance on foreign big tech companies.

AI is a hot topic, and Abeln, who became a professor at Utrecht University about two years ago, is seeing that firsthand. She is often invited to speak at scientific conferences, and she and her team receive numerous requests for collaborations.
Foundation models
Abeln and her team use AI technology to gain deeper insights into complex biological systems, such as cells, organisms, and ecosystems. One of the tools they work with are so-called foundation models—very large mathematical models trained on enormous datasets. And just as ChatGPT is a foundation model for human language, there are also foundation models for "protein language", which is made up of sequences of amino acids, the building blocks of proteins.
The protein models uncover aspects of protein language that we ourselves do not yet understand.
Abeln: "Foundation models are massive and require a huge amount of computing power to train, so we do not do that ourselves. Instead, we use those models as a starting point and fine-tune them. With relatively little data, we can then teach them something new."
Protein folding
Abeln explains that she and her team for instance use protein language models to predict how proteins clump together. The clumping (aggregation) of proteins plays a key role in the development of brain diseases like Alzheimer's and Parkinson's.
Abeln: "We do not just want to make predictions, we also want to understand the factors on which those predictions are based. We achieve this by mapping properties of the proteins, like their length and surface area. By analysing which properties correlate with the model’s predictions, we see properties that we already knew affect aggregation, but also completely new properties that we had not previously linked to aggregation. This way, the protein models uncover aspects of protein language that we ourselves do not yet understand. This gives us new insights and new leads that we can focus on in the lab."
We are able to uncover very complex relationships that are not easily visible to the naked eye.
Combining two types of data
The team also uses AI models to uncover relationships between two types of data, that would remain undetected by other methods. For instance, they combine genetic data from plants with hyperspectral images of the same plants, which reveal in detail how the plants reflect electromagnetic radiation across different wavelengths.
Abeln: "We are trying to see if we can predict genetic variants from the spectra. This allows us to determine from the hyperspectral images whether a specific genetic variation is present or not. It also helps us identify which DNA variations impact a particular wavelength or pattern.
This is crucial information, because hyperspectral data is typically linked to specific substances that the plant produces. So, if you observe that a certain genetic variation influences the hyperspectral images, it means that that variation is affecting the plant's metabolism. This provides plant researchers with clues about which genetic variations to focus on."
Tumours
As long as there are large amounts of data, a similar approach can be applied to various issues, including medical ones. For example, Abeln's team combines genetic data with tumour expression data. Expression data show which genes are "active" and producing proteins.
"Based on expression, we can predict whether a specific mutation is present in the tumour," says Abeln. "At the same time, we can identify which mutations cause changes in expression and, in turn, affect how the tumour develops. We are able to uncover very complex relationships that are not easily visible to the naked eye."
I hope that in Europe, the Netherlands, or Utrecht, we take back control. It is better for the transparency, reliability, and accessibility of these models.
Not just predicting, also understanding
Abeln sees AI models becoming larger and more complex. "We can now identify and predict much more intricate relationships than we could in the past, which is great. But the true understanding of what drives those predictions is still growing only slowly."
That is why Abeln would like to see more focus in her field on understanding where predictions come from. "When developing a drug, it is often important to predict how the disease will progress. But only when you can understand what causes the disease, you can make targeted drugs based on that. So, it is essential to go back to the cause."
Taking back control
Abeln also voices concerns about reliance on big-tech companies. For instance, Meta and Google own the leading foundation models for protein language. "I hope that in Europe, the Netherlands, or Utrecht, we take back control," Abeln says. "It is better for the transparency, reliability, and accessibility of these models. This is important not only for science, but also for the Dutch biotech sector."