Lieke Vree: a Star in Bioinformatics at VetGalaxy Project
Collaboration for Efficient Veterinary Data Analysis

Lieke Vree is a bioinformatician at the Expertise Center for Veterinary Genetics at Utrecht University. Together with molecular geneticist Frank van Steenbeek and the central Research Data Management Support department, she is working on the VetGalaxy pilot project to ensure researchers can analyze data streams efficiently and intuitively. This unique collaboration is interesting not only for other geneticists but also for research departments and faculties.
How did you join the Expertise Center for Genetics in Veterinary Medicine?
My mother is a veterinarian, and she used to advise a breed association on breeding practices. I accompanied her to a lecture organized by the Expertise Center for Genetics, and I became very enthusiastic about the research. I first did an internship there, and for the past three years, I have been a proud part of the team as a bioinformatician.
Can you briefly explain what the VetGalaxy pilot entails?
In veterinary medicine, we are increasingly using big data. For instance, we receive SNP data from various laboratories regarding DNA variants and full genome sequencing data. These laboratories obtain the data through veterinary practices or owners who participate in the research with their dogs. All this data is combined into a large dataset, which currently includes thousands of dogs.
What challenges do you face?
The data we receive from different companies and laboratories can vary greatly in format and structure. For instance, it might be recorded differently. A master's student in bioinformatics, Marilijn van Rumpt, developed scripts to automate the merging of genetic data from different sources. We use these scripts in the user-friendly Galaxy platform so that everyone can work with them. The data is cleaned, quality-checked, and then merged with the existing dataset.
What can researchers do with this data?
They can use it to analyze how genetically similar or distant animals within the dataset are and determine familial relationships. They can also extract data of specific breeds to conduct genetic research. The program is intuitive; you click on what you need, and the output is generated. Without these scripts, all users would need programming knowledge to retrieve the information, which is unrealistic.
Our research group primarily uses this data to identify links between specific genomic regions and breed-related diseases. The ultimate goal is to pinpoint DNA variants responsible for increased disease risks, enabling the development of DNA tests. Breeders can then screen animals for these risks and avoid breeding with carriers of high-risk mutations.
Can you give an example?
Since we have accumulated so much data, we can check our database to see if a discovered DNA variant is truly rare. For instance, suppose we’re investigating a neurological condition like epilepsy in a specific breed and identify a DNA variant that seems causative. If that variant is also common in the general dog population, we must question whether it’s truly the causative mutation. The program allows us to set a maximum frequency threshold for variants in the database, automatically filtering out non-relevant ones.
What is your ultimate goal with this work?
Our aim is make it relatively easy for researchers to clean data from specific dog populations and work with high-quality data. We can also add more control animals of the same breed from the larger database, eliminating the need for manual or programming-intensive processes. We want the platform to be so user-friendly that even those not tech-savvy can use it. We're already seeing great progress in this regard.
Who are you collaborating with?
We work closely with the Research Engineers from Information and Technology Services (ITS). My colleague Frank van Steenbeek responded to their Autumn Call for Research Engineering projects. The concept is that they adopt and help execute a project for a research group at Utrecht University. Our project is a their pilot, and it’s going very well! They are highly skilled people, who remain available for questions even after the pilot, which is incredibly helpful. Additionally, René Adelerhof, the ICT Consultant for Veterinary Medicine, works with us to identify potential future collaborations within the faculty. After all, everyone deals with data.
Do you already see potential for other applications?
Within the department, we already collaborate extensively on data from the OnGo research group. There are also opportunities for research on farm animals and horses, where significant genetic data is collected. Beyond data management and processing, the Research Engineers offer many services. I’d encourage others make use of those!
What is your view on the future of genetic research and big data?
I see numerous opportunities for developing scripts, such as filtering out common mutations in large databases or comparing mutations in dogs diagnosed with genetic diseases versus dogs without that disease.
We already have other applications, like PetScan, a system for registering diagnoses, and Fit2Breed, a matching app for healthy breeding practices. Those systems are up and running. For PetScan, I perform data analyses and create visualizations. For Fit2Breed, I process SNP data to calculate inbreeding levels for individual dogs. This allows us to estimate genetic relationships between dogs. Currently, it’s only available for Dutch Kooikerhondjes, but we’re expanding to more breeds. We’re proud that this platform is already used worldwide by Kooikerhondje breeders.
We’re also developing outcross modules to estimate the percentage of various breeds in a dog. This way, we can promote healthier breeding practices!
What about your own future?
The VetGalaxy pilot ends in March 2025, but I hope to continue working on such optimization projects for a long time. Many people start in one job and move on to better ones. Well, I love animals, genetics, and informatics, so I’ve already found my perfect combination. What more could I ask for?