‘Everything we develop is open, always’
Stories from the Lab: the AI-aided Knowledge Discovery Lab
In his free time, he organised the Dutch National Championship for skating backwards, but during working hours, Jonathan de Bruin only moves forward – with open science, that is. De Bruin and the research engineers at the AI-aided Knowledge Discovery Lab are crafting clever algorithms to help researchers swiftly sift through vast amounts of text files. The software they create is always open, inviting others to build upon their innovations.
Seven years ago, Jonathan de Bruin, aTU Delft alumnus, became the first research engineer in the central IT department (ITS) at Utrecht University. The university had decided to invest more in Research Data Management, and in his role, De Bruin played a pivotal role in establishing an entire team of research engineers. This team provides support, guidance, and tailored solutions to researchers in need of managing and analysing their research data effectively.
“Scientific research is becoming increasingly IT intensive,” De Bruin notes regarding the necessity of such a team. “There's more research data, and we're working with ever more complex computational and computer models. The rise of artificial intelligence further expands the possibilities for data analysis and pattern recognition. That means we neednot only significant knowledge and expertise, but also an IT infrastructure tailored specifically for such research. By approaching it from a central level, we can facilitate knowledge sharing and better monitor the activities within the faculties.”
Researchers already work with technology, and many are quite proficient in it, De Bruin acknowledges. “But technological development is moving at such a rapid pace that keeping up with it is essentially a full-time job. It's nearly impossible to combine that with writing research proposals, conducting studies, and publishing results.” Research engineers serve as a bridge between the world of research and IT solutions. It's a role that not only brings great satisfaction to De Bruin but is also one in which, according to colleagues, he excels. “He has an uncanny ability to understand what a researcher truly needs,” says Rens van de Schoot, professor and director of the Disc-AI Lab, with whom De Bruin collaborates closely. “I always try to be one step ahead of the researcher and have the tools developed before they even ask for them,” De Bruin admits. “This way, researchers can focus more on substantive questions, while we can continue to rapidly advance the technology.”
Technological development is moving at such a rapid pace that keeping up with it is a full-time job
At the Disc-AI Lab, researchers and research engineers collaborate to determine which AI solutions, ranging from advanced software to computer model development, are needed for specific research projects. “Our research lab primarily focuses on extracting knowledge from vast amounts of text, such as scientific publications. We explore how machine learning can assist in systematically reviewing publications. During such reviews, researchers manually screen thousands of publication titles and abstracts, only to find out months later which papers are relevant and which are not. This process could be expedited by training an algorithm to search for specific content.”
De Bruin and the research engineers at the lab achieve this using a form of machine learning called active learning. In this process, there’s an interaction between the model and humans, with the interaction serving to guide the model in the right direction. “You can liken this interaction to the yes-no games you used to play, where you had to figure out what the other person meant by asking clever questions, and they could only answer with yes or no. For instance, if a researcher wants to find all relevant scientific literature for a specific research question, the algorithm poses questions to the researcher, such as whether a particular publication is relevant. Based on the researcher’s response, the algorithm makes an intelligent decision about the next step. In the end, the algorithm selects the papers that are relevant for the research from that vast stack.”
This not only saves an enormous amount of time but also a considerable amount of money, De Bruin emphasises. “I estimate that within our university, nearly a million euros in personnel costs are spent annually on systematic reviews. Systematic reviews are a crucial part of these researchers’ projects, but they consume a lot of resources and energy. Moreover, they are not always the most enjoyable aspect of their work, and researchers would rather spend their time differently. Thanks to AI, this part can be accomplished much faster and with higher quality.”
Moreover, the societal relevance of such AI techniques is readily apparent. For example, one project at the lab focuses on how expedited systematic reviews can lead to improved medication dosing for children. In another project, researchers examine how systematic reviews of risk factors can contribute to preventing PTSD in individuals who have experienced traumatic events. However, ASReview, Active Learning for Systematic Reviews, can also be employed for systematic reviews of court rulings, patents, policy documents, emails, or social media posts.
Join and contribute
What’s particularly appealing about the software developed at the Disc-AI Lab, according to De Bruin, is that it is entirely open source. “Everything we develop is open, always, from the very first line of code. Open sharing means that society as a whole can benefit from your research. This open strategy enables other parties and external developers to quickly join and contribute. This can take various forms: adding a piece of code, reporting something that doesn't work well, or enhancing documentation. Beyond the university’s walls, there's a vast pool of knowledge. By collaborating, we can be a worthy counterbalance to big tech companies. We don't want these companies to dominate research into the multitude of AI applications.”
The algorithm selects the papers that are relevant for the research from that vast stack
According to De Bruin, the university has a significant role to play in this regard. In addition to his work at the lab, he serves as the project leader for the ‘FAIR Data & Software’ track within Utrecht University’s Open Science program. In this role, he aims to garner more attention within the university for open data, open software, and the prerequisites necessary to establish a more open research cycle. “The ultimate goal is that all the data and software you collect and develop can be reused, resulting in reliable, transparent, efficient, and impactful science.”
When it comes to open-source development, De Bruin believes that the university’s ambitions cannot be set too high. “I consider it of utmost importance for the university to actively participate in the open-source community, especially in the realm of AI. When you share everything and collaborate, you can make great strides and enhance science.”
De Bruin also argues that recognition and rewards for similar efforts should match those given to researchers for publishing their research. “It takes a lot of time and effort to openly publish a valuable dataset or piece of software. So, during performance evaluations, no one should hear that they could have written an extra paper instead. Data and software can have a tremendous impact, and the university is increasingly realising this. Therefore, we must take steps now and invest in the knowledge and expertise needed to keep AI open and transparent.”
More about Jonathan de Bruin
- Winner of the Dutch Data Prize 2020 with the project ‘CoronaWatchNL’
De Bruin collected all data on COVID-19 infections and fatalities in the Netherlands from the RIVM (National Institute for Public Health and the Environment) and hospitals. This dataset was openly accessible and reusable for the entire research community.
- Winner of the skate competition ‘Vondelpark Cup’ in the freestyle slalom skating category and organiser of the open Dutch National Championship for skating backwards.
- De Bruin initiated the development of open-source software for his master’s thesis. The software he created has been installed millions of times and has been recognized by the Python Software Foundation as crucial software.