Data scientist is the sexiest job of the 21st century

The Applied Data Science research focus area has been around for over four years. Peter van der Heijden, Professor of Statistics at the Faculty of Social and Behavioural Sciences, and Assistant Professor Sara van Erp, both of whom are pioneers in this focus area, explain how data science offers researchers new opportunities.

Twenty years ago it seemed unthinkable: a computer that predicts incidents of aggression among patients in a psychiatric ward. And yet this is now a reality. Using machine learning, a data science technique for making predictions, the computer can analyse nursing staff handover reports to predict whether there will be violent incidents the next day. This is just one of many examples of data science in practice. Peter: ‘Because computers have become so much faster, far more is possible in the field of data analysis than was the case some 20 years ago. Much more data is available and different types of data can be analysed than in the past, such as text, video and audio.’
 

Peter van der Heijden

Different fields

The research focus area began some four years ago when the Master’s in Applied Data Science was developed, in order to also connect research in this field across the university. Peter: ‘The people involved in the Master’s programme came from very different fields, including the medical world, computer science, geosciences, humanities and social sciences. Everyone agreed that UU should not miss the boat on data science applications, both in research and in clinical practice.’
 

Collaboration across faculties

The aim of the focus area is to help initiate interfaculty collaboration between researchers who use or would like to use data science. ‘The added value of this research focus area is really that it transcends faculty boundaries. People know how to find each other within a faculty, but it's more complicated between faculties. Take text analysis as an example: people work on this in the humanities, computer science and social sciences, as well as at libraries and hospitals. We are trying to get these different groups of researchers to help each other.’
 

The added value of this research focus area is really that it transcends faculty boundaries. People know how to find each other within a faculty, but it's more complicated between faculties

Sara van Erp

Sexy field

The explosive growth of the Master’s programme is already an indication that we will no longer be able to ignore data science in the future. The programme had 80 students enrolled in 2020, and this year there are already 200. ‘That is quite a lot for a Master’s programme. It shows what people have been saying for some time: that data science is the sexiest job of the 21st century’, Peter says.
 

Applications in the field

Peter: ‘I think that we at UU should give researchers tools and guidelines to apply data science in their own fields. We should also continue to develop data science methods through fundamental research.’
 

Algorithm looks for brain abnormalities

UMC Utrecht already has experience with the application of data science. In the life sciences, many data science methods are developed through competitions. One of these competitions involved developing an algorithm that can best detect brain abnormalities in images taken by the Radiology Department. The winning algorithm is now being applied in practice. The major advantage is that the doctor does not have to assess the images on their own, but can do so together with the machine. Doctors like the idea of having someone – in this case a machine – look over the images with them. 
 

Encouraging and facilitating

The field of cultural anthropology mainly involves qualitative research; data science is not the first thing that comes to mind. Nevertheless, Peter says, a cultural anthropology project was carried out within this research focus area. Anthropologists were able to search through large amounts of Twitter data to identify a network structure of environmental activists. Data science techniques make this much easier, but anthropologists do not always have the necessary expertise. ‘What you need for this type of project is the subject-matter expertise of the anthropologists, the methodology of data science and a programmer who can put data into an analysable format. We help bring together such a multidisciplinary team. This is the way research should increasingly work. I think that, as a university, we should encourage and facilitate this wherever possible’, says Peter.

Who wrote the Dutch national anthem?

Data science can also help the humanities. Utrecht University collaborated on a project to identify the author of ‘Wilhemus’, the Dutch national anthem. Using new computer techniques, researchers from Antwerp, Amsterdam and Utrecht tracked down a possible author: Petrus Datheen, an author who had never before been mentioned as a candidate in traditional research. The research team analysed ‘Wilhelmus’ using computational techniques. The analysis of certain authors’ word patterns and language use is not new, but computational techniques enabled the researchers to look at other word patterns – namely, the distribution of words that everyone uses unconsciously (articles, conjunctions, etc.).
 

Don’t depend on what you can already do

It’s clear that a lot is already happening. But when will the work in this research focus area be finished? Will it ever be finished? Peter: ‘We have no idea where data science will end up. If I relate it to my own field and you ask me when I will be finished, I think the answer is: never. Because statistical and data science techniques are still evolving, researchers’ questions are evolving as well, which means that new techniques are constantly emerging.’ In other words, there is plenty more work to be done in this focus area. Peter: ‘When formulating a research question, a researcher ideally shouldn’t depend too much on what they have already mastered. It’s better if you tackle the content of the question head-on and then bring in the methodology and expertise through others. That fits in with the current trend of working in teams.’
 

Because statistical and data science techniques are still evolving, researchers’ questions are evolving as well, which means that new techniques are constantly emerging.

Peter van der Heijden

Ask different questions

Sara can relate to this: ‘Researchers these days still mostly stick to what they know. The focus area can help researchers understand the other methodologies and data science techniques that exist and how they can be applied. Through this focus area, we aim to inspire researchers to ask different questions. We do this by sharing experiences in meetings and, more importantly, by making funding available through the research focus area. For every grant we provide, the project involves a researcher from a particular field, someone who is familiar with data science methodology, and a programmer.’ Peter: ‘We also want to build a bridge between researchers and support initiatives at UU, such as FAIR Research IT. Our research focus area helps both parties find each other more easily.’
 

Hackathon

Statistics Professor Rens van der Schoot organised a Hackathon to investigate the communication between multinational oil and gas company Shell and the Dutch government. As part of this hackathon, Follow the Money (FTM) collected numerous documents. But how do you find exactly what you’re looking for in thousands of documents, without having to read them all? Active learning could help FTM find the relevant documents sooner. This hackathon also marked the launch of the Special Interest Group for Active Learning (SIG-AL), which brings together all aspects of active learning.