From sharing code to publishing open software

Publishing impact according to Rens van de Schoot, Lars Tummers en Jonathan de Bruin

Rens van de Schoot, Professor of Statistics for Small Data Sets, Lars Tummers, Professor of Public Management and Behaviour and Jonathan de Bruin, research engineer at ITS, collaborate in the Utrecht University research project ASReview. They research the use of machine learning at  systematic reviews and stick to the principles of Open Science. Together they make a step from sharing code to the open publishing of software.

What is your definition of Open Science?

Rens: "Open Science used to mean the same as Open Access. By now, all these open access deals have helped to crystallize the concept. At least, where the Netherlands are concerned. I have been asked already as a co-author, so that only my open access options could be made use of. This role of funding author we could not have predicted some years ago, when we made a video about authors’ roles with the Young Academy.

Now Open Science to me is mainly sharing teaching material and sharing code. I try to put all my teaching material on my website, available for everybody. I have thousands of visitors every week. For me sharing code began a number of years ago when PhD candidates started to add their R code as appendix to their journal articles. In that time I met Jonathan who told me that I should publish my code on GitHub, with version management and under a licence."

So now you share all your code, open to anyone?

"Well, I have had some sleepless nights, it is scary as hell. I’d rather share everything under a licence that makes sure that everything remains mine and no one else can make money with it. But Jonathan convinced me that it must be published completely open."

Jonathan: "Open sharing means that all of society may benefit from your research. This also means that a company can be started based on your code. That profits may be made. That your idea can be developed further in a closed environment. But still, open sharing is necessary, because it may result in unexpected visitors, not only from the academic world, but also from the business world. There is lots of talent to be found in the commercial environment. If you can persuade them to contribute to the open source project you are carrying out in the academic world, this may lead to a very beautiful collaboration. 

That collaboration had not come about if we had not shared our project openly.

Jonathan de Bruin

What does this mean for your project, ASReview?

"In ASReview we investigate the use of machine learning at systematic reviews. Researchers who have to perform a systematic review must screen thousands of titles and abstracts of publications to select an often very small percentage of relevant papers. We have developed at tool that shows the researcher papers based on relevance. The order in which the papers are shown are continually adjusted by the choices the researchers make: include or exclude.

We need that tool in order to carry out research, but it does not lose its important once our research has finished. By involving as many parties as possible in our project now, we make it viable. Other parties can bring ASReview further, also after it not challenging any more, from an academic point of view.

A semi-government body as the European Food Safety Authority (EFSA) has already developed code for our project. That collaboration had not come about if we had not shared our project openly.

We could also collaborate with the Allen Institute for AI, which has compiled an open dataset with publications about COVID-19.With the help of ASReview you can search this set of nearly 300,000 publications. And in the middle of the corona crisis, this resulted in a  collaboration with the Dutch Federation of Medical Specialists which wants to investigate if they can use ASReview to bring medical guidelines up to date based on the most recent scientific viewpoints. This means another chance for us to carry out further research."

So you want to let ASReview go at some moment in time?

Rens: "As a scientist you need to valorise academic knowledge at some point, make your knowledge available so it can be put into practice. But what if it turns out that this knowledge can actually be used? Who should be responsible for doing so? You may wonder if this is a task of the university, after all we are no software producers, I think?"

Jonathan: "We are stimulated to share code and software openly and make reuse possible. But the moment you succeed, you have a problem as a researcher. Your name is attached to it and you must maintain it and develop it further. That takes up a lot of time, but the options for funding have dried up, because the innovative character is no longer in place."

Rens: "That is why we need to discuss incentives and rewards."

What should change in the incentives and rewards for researchers?

"A while ago, the Young Academy published a position paper (in Dutch only) about the new incentives and rewards. For instance, you should appreciate that someone keeps software in the air and as a result publishes fewer academic papers. We have just written 150 pages of documentation which I  cannot register as research output in Pure. The code must be clean, annotated and well described. That is a lot of work, but it makes the difference between a script for your own use or a script that can be reused by anyone. Are we going to accept that a PhD delivers one academic publication less, but has his code made ready for reuse?"

Jonathan: "In fact, we are talking here about the difference between code and software. Code is a script you add to your publication as an appendix. Software is code that has been made suitable for reuse. The boundaries are blurry, but the step from code to software is quite big."

Rens: "Sharing code is the very least you should do as a researcher when we are talking about transparency. But to make the step to open science and sharing software you really need to look at incentives and rewards. I lead a lot of projects in which I hardly occupy myself with open science/ in which open science only plays a small part. I work with young researchers who can’t afford to do something for which they will be indirectly rewarded. They need these changes."

How do you cooperate as researcher and research engineer?

"Incentives and rewards is a major topic, but the university is also busy to arrange support. Experts from the library and ITS are a great help. A researcher cannot do this alone."

Jonathan: "You cannot do this research on your own. It is just not possible to be a good researcher and a good programmer at the same time. They are two completely different professions. Only, we must understand each other. As a programmer I need to understand how research works."

Lars Tummers joins the discussion later and recognizes a lot of what is already being said: "For many researchers who carry out applied research, this is unknown territory. They really need help in this. If you publish a study, are you also responsible for replication of the theory in your field? That is what we should be heading for, that is cumulative science! Often there is no follow-up, is a person proceeding with something slightly different. By documenting your data and code, researchers can build on your research. You must be very open and transparent, but you must have some help in this.

Many researchers don’t know this kind of support exists. Make data engineers PhD candidates too and let them work in a department. Why can’t a doctoral thesis not consist of several data papers. My research proposal for the application of Machine Learning in nudging has just been approved. I wrote the part about nudging, a data engineer wrote the part about machine learning. And the research we do together. I don’t see the distinction between research and support, what that data engineer is doing is also a form of science."

Do you want to find more inspiring impact stories? Or would you like to share your own experiences? Read the other impact stories or contact the library.