Using data standards enables collaboration and reuse of hydrological data and the ‘Global Water Balance Model’
What does research data management mean in the daily practice of a researcher? In this series of interviews by RDM Support, researchers share their experiences on various aspects of research data management. In this interview, Niko Wanders and Edwin Sutanudjaja at the department of Physical Geography talk about their experience with publishing data on 4TU.ResearchData.
Since 2011, researchers at the Faculty of Geosciences of Utrecht University have been working on a large-scale hydrological model that predicts the flow of precipitation through the earth. With this global Water Balance Model (PCR-GLOBWB), it becomes possible to trace the flow of one droplet of rain through the different layers of soil and rock, until it reaches the sea. The model, produced by the hydrology team at the Physical Geography department, is based on data collected in the last 120 years. The model itself is a team effort of around 18 team members, who have all continuously developed the model over the last 10 years. This close and effective cooperation led to the current version of it (Sutanudjaja et al. 2018). The model is used by governments, NGO’s, researchers and policymakers to do climate simulations and to map the possible impact of (the lack of) rainfall.
In this article, we zoom in on this groundbreaking initiative. We interviewed assistant professor Niko Wanders, part of the development team, and Edwin Sutanudjaja, one of the core developers of the model and the main support to the model’s user community. Together with professor Marc Bierkens and assistant professor Rens van Beek, they are coordinating the developments on the PCR-GLOBWB. The dataset, in NetCDF format, can be freely consulted through the 4TU.ResearchData OPeNDAP server. We asked Niko and Edwin about their efforts to make the data used to build this model openly accessible and FAIR.
The impact of the research
The Water Balance Model (PCR-GLOBWB) has a clear impact on policy-making, climate predictions and other researchers. The Dutch government, the World Research Institute and the European Commission are just a few examples of those who profit from the predictions and simulations made by the project team.
For instance, the model can help to predict the chances and impact of flooding in specific areas undergoing temperature increases of 2 or 3 degrees. The model can produce forecasts based on a wide variety of variables, such as temperature, evaporation rates, elevation, discharge and soil moisture. Since recently, the model can also account for variables that deal with human intervention, such as the impact of reservoirs, irrigation and livestock water use.
The dataset is a real community effort. There are a lot of researchers, sometimes also from other universities, that contribute to it.
Niko underlines that the project team works very hard to maintain the dataset, which is needed to run the model: ‘The dataset is a real community effort. There are a lot of researchers, sometimes also from other universities, that contribute to it. We organize monthly meetings to streamline everything. In these meetings we decide on the way forward, for instance on new attributes and developments, and we make agreements on the funds we would like to apply for. Only with this data, the model itself can be adjusted to answer local questions’.
The published dataset consists of data that are collected by researchers all across the world. Niko explains: ‘We aggregate data that we receive from all over the globe. This gives a unique view on how the rain moves through the surface, across the earth, towards the sea. The dataset really keeps track of every location in the world, and this is the main reason that other institutions are so keen to use it’. Edwin adds: ‘With the data, we can create monthly forecasts that predict the flow of water for the coming six months. This helps to predict upcoming challenges, such as crop failure and energy problems. To that end, we recently also scaled up the global resolution accuracy of our model from 50 kilometres to 10 kilometres, so we can really go into depth and get answers to very detailed questions’.
NetCDF data and OPeNDAP
The complete dataset comprises more than 14 TB of data in different formats, but a significant portion of it is in the NetCDF format (Network Common Data Form), which is widely used in climatology and forecasting. NetCDF is a self-describing, machine-independent data format that supports the creation, access, and easy sharing of array-oriented scientific data. As this is an open format, it can easily be used and analyzed with different software-tools.
OPeNDAP has opened a lot of opportunities: education, sharing, external processing.
Researchers often contact Edwin to ask for the data underpinning the Global Water Balance model. Responding to such demands is difficult - it is impossible to send the data through email. Downloading it from standard repositories is also not an option, as the users would then need to download the full dataset on their personal drives. The OPeNDAP server, offered and supported by 4TU.ResearchData, provides an excellent solution. OPeNDAP makes it possible to easily access and explore parts of the dataset, before deciding which part of the data to download.
OPeNDAP creates a lot of opportunities
Niko further underlines the value of OPeNDAP: ‘OPeNDAP is easily accessible, flexible, and it is a piece of cake to adjust and customize the area that you want to study. It makes very short and targeted simulations possible’. For now, Niko and Edwin have published about 250GB of the complete dataset on OPenDAP, and they intend to upload more data soon.
Making their NetCDF-data openly available on the OPeNDAP server of 4TU.ResearchData also has other advantages. Niko states: ‘Sharing the dataset in a standardized format with researchers across the world is essential to further increase its impact. By sharing our model, new knowledge can be created upon it. Next to that, the possibility of exploring the dataset in OPeNDAP also opens up a broader community of users, who can now use particular parts of the dataset to answer their own regional and local questions, for instance focussing on developing countries’. Edwin admits: ‘There is a lot of interest in accessing our data, and that is why I am now working on a user manual. It is good to let the researchers know how to use the data and how to navigate through the overwhelming amount of variables that we offer’.
The use of Interactive maps
Despite the model’s success, Niko also identified an important challenge when it comes to the research: ‘Modelling is not very sexy, so it does not bring in a lot of money from grants. If we would have the resources, I would definitely want to visualize the data. That would simply be amazing! Well-visualized climate data is simply used and analysed the most, as the researcher can then interpret the data much faster. The use of such interactive maps would massively contribute to the usability and the impact of the dataset. And that is what we all do it for: providing answers to pressing global hydrological problems!’
Support
Both Niko and Edwin appreciate the support from RDM Support and 4TU.ResearchData. They experiences that both parties are looking for possibilities to support researchers. “The contact opened up a lot of opportunities, Niko says. “We are all working towards a common goal: sharing and reuse of research data. Our collaboration has already been very fruitful.”
This interview was a cooperation between 4TU.ResearchData and RDM Support of Utrecht University.
References
Sutanudjaja, E. H., van Beek, R., Wanders, N., Wada, Y., Bosmans, J. H. C., Drost, N., van der Ent, R. J., de Graaf, I. E. M., Hoch, J. M., de Jong, K., Karssenberg, D., López, P., Peßenteiner, S., Schmitz, O., Straatsma, M. W., Vannametee, E., Wisser, D., and Bierkens, M. F. P. (2018). PCR-GLOBWB2: A 5 arcmin global hydrological and water resources model. Geosci. Model Dev., 11, 2429-2453.
DOI:10.5194/gmd-11-2429-2018