28 March 2019

How using High Performance Computing can reduce computation time from three months to one day

What does Research Data Management mean in everyday practice? In this series of interviews by RDM Support, researchers share their experiences on various aspects of research data management. In this interview, PhD Mariana Simões about High Performance Computing (HPC).

What is the effect of pesticides on the health of surrounding residents? Mariana Simões is devoting her PhD to this question. To find answers, Mariana performs extensive analyses to combine data from all kinds of institutes.

“I started my PhD at the Institute for Risk Assessment Sciences (IRAS) in 2016. I'm looking specifically at the people that live near fields with crops where pesticides are applied. I want to find out if these people have more health problems than people who live further away. My PhD contract has a duration of three years, instead of four years, because I could work with existing data. For instance, I use data from Landelijk Grondgebruiksbestand Nederland and Basisregistratie Gewaspercelen. I use their crop maps for data on types of crops in agricultural fields in the Netherlands. To prepare this data for my analyses, I had to calculate the area of 29 specific types of crops in 4 different buffer sizes around each house in the Netherlands.”

My computations took a bit longer than I thought…
Portret Mariana Simoes with wind blowing in hair
Mariana Simões, Photos by Annemiek van der Kuil | PhotoA.nl

Nine months, day and night

“But these computations took a bit longer than I thought. It took me nine months to calculate this for the years 2009 to 2014. At this point I was already using three workstations, running day and night. After four months I started to panic, and I asked my supervisor to please give me a fourth workstation. But also with the power of four computers it was still taking a lot of time, and I had to do many more of these calculations. Then Kees van Eijden, HPC expert at Research Data Management Support and ITS, happened to send a general email to our institute saying: ‘If you work with a lot of data or you have really complex computations, you can contact me to make it more efficient.’  So I immediately said that I was interested in learning more, because it was really taking a long time for me to do all my computations.”

Using HPC has reduced my computation time per year of data from three months to only one day

High Performance Computing

“I had no idea how Kees could help me, so I just said: ‘OK, this is what I have.’ I could see in his face that he already had quite a few ideas. The solution for my problem was High Performance Computing (HPC), a computer cluster with the computational power of hundreds of workstations. This computer at SURFsara can be accessed remotely from your own workplace, using a secure connection. Kees helped me to get my scripts ready for this HPC computer. Using HPC has reduced my computation time per year of data from three months to only one day. Because I have to do these computations for more than ten years of data, it saves me a lot of time.”

Portret Mariana Simoes with a hat in front of building

 “At the institute, we were using STATA software for analyses. Kees rewrote these STATA scripts into fast and efficient R codes. He wrote all the scripts that were needed to run the analysis on the HPC computer. And of course I had to learn how to do it. For every year of data in the crop maps I need to do a separate analysis, and now I can change the scripts myself. Also, when I’m finished with my PhD, someone else will have to carry on. Now with the help of Kees we have one script to do all the computations with an HPC computer. This is great for further research.”

HPC is not hard, you can do it and we are here to help you

Saved soul

“Kees wanted to know if my colleagues might be interested in HPC. So, I asked him to present something about HPC during one of our weekly lunch seminars at the institute. Because so many people showed up and had so many questions, Kees had the idea to organise a workshop about HPC. As I was planning this workshop with him, I asked him what the minimum number of participants was for doing the workshop. And he told me: ‘Well, I don’t know… Maybe five? Actually, any participant that comes is another soul saved!’ “

“In total nine participants joined the workshop, where Kees showed all kinds of different situations where HPC can help make analyses more efficient. The main message of the workshop was that HPC is not that hard, you can do it and the experts from RDM Support are there to help you. As I am working on my research I always keep in touch with Kees. If I write something to use in HPC he is always there to check it for me, and sometimes he comes up with ideas to make it even more efficient. If I have a question he is the first person I ask. If he also doesn’t know, we both contact the helpdesk from SURFsara. Kees has been a big help for me and my research.”

Portret Mariana Simoes with wind blowing in hair

Mariana Simões

Mariana came to the Netherlands seven years ago. “After my veterinary medicine degree in Portugal I didn't see myself doing clinics. I really liked this one course on epidemiology, so I wanted to continue in this field. I was looking at the best places to learn more on epidemiology, and one of the options were the Netherlands. My husband already got a job there, so the choice was very easy. So I did the Master’s programme in epidemiology here in Utrecht.

After my Master I applied for this PhD position at the Institute for Risk Assessment Sciences (IRAS) which I find really interesting. Working with environmental exposures, such as pesticides, is also within the increasingly fascinating One Health concept, where we integrate the fields of human, veterinary and environmental sciences to reach better health for all. What I like about my research is that you combine data from the individual cases, and identify risk factors for health, to ultimately understand what is good for the populations in general.”

More information

Are you interested what High Performance Computing can do in your research? Or do you have other questions regarding research data management? Then contact us at Research Data Management Support.