16 October 2018

Google’s data hunger: a giant appetite stilled by Yoda

Since February 2018 researchers of the Utrecht University can store, archive and publish their research data on Yoda. In doing so their data will also come available through the DANS Narcis Catalogue.  Sounds nice … but hardly international.

In comes Google with Google Dataset Search. A search engine dedicated to datasets. The engine, currently available as a beta product, is marketed as a service for researchers and journalists, though some self-serving might have played a role considering Google’s own appetite for data as input for its machine-learning algorithms.

The new search engine tracks ‘datasets’ in data repositories and on websites. In order to do so, it depends on specific metadata on the websites it harvests. This means that both the dataset you’re looking for as well as for your own dataset to be found via this engine, the datasets need to have been published beforehand on a platform that has the required metadata markers. Many data providers have added (or are busy with) the required metadata. E.g. all datasets archived by, of harvested by DANS are findable via this Dataset Search.

As DANS harvests Yoda, our own University’s Research Data Infrastructure, all your research data archived and published in Yoda will be discoverable via this Google Dataset Search, as this and this example shows. The same goes for sets archived in Utrecht’s DataverseNL environment.

Google Data

According to Google, the current offering leans much to governmental, geo- and sociological data. It calls for both organizations and individual researchers to add the required metadata to their websites and offer support and tooling to do so.

So the service appears to offer added value as a central platform for the search for data. It is still a beta-version and specifics, like an advanced search, is not (yet) available. Google does have a reputation of offering services to the public, only to shut them down after a trial period, but if this becomes as popular as Google Scholar, then it will have a bright future and can become an important tool for researchers all over the world.

Research Data Management Support will keep track of the further developments of Google Dataset Search.