Lacking documentation for an existing data set

Crashing hard drives? Backups gone missing? Bugs in your code? Lost all your data? Who ya gonna call? Just the thought of it! And yet it happens every day. During this Data Horror Week, researchers will share these horror stories, based on their own experience. To prevent you from making the same mistake!

Tell us your horror story, what happened?

When I started my PhD, I was told to work on unpublished data that was collected 3 years prior to me starting. This supposedly would give me insight in data and part of the topic I was working on. I received various folders that were full of data. After going through them, there were several datasheets with duplicate names but different contents, there where scripts that people did not know what they did or why, there where column names that where very difficult to ascertain what they indicated, and the exact equipment and settings used where quite unknown (especially since it was already several years ago that it was performed). In the end, it took me roughly 6 months to figure out what was done and what the data meant, and several talks with the manufacturer of the equipment used to get to the conclusion that the data were poor at best and should not be used for publication.        

How long ago was it?

6 to 7 years ago          

How was this solved?

Several meetings with the manufacturer of the equipment used, several meetings with the researchers that performed it several years before, tedious step by step replication of the data through the available poorly documented scripts. In the end, it could not fully be resolved due to poor description of methods, data, and scripts. It was a waste of time and resources for me but also the researchers that did it several years before.

How could this horror be avoided?

By planning and describing the data collection and analysis process. Although it takes quite some time to describe what you are doing / what you've done well, it takes even more time and frustration to try and figure out what was done several years ago. Even though you might think you know what you mean with a (poor) description of data for years to come, it is likely you and/or others will be scratching your/their head trying to figure out what you meant. 

What lesson can we learn from this story?

Documentation takes time but is very valuable when you revisit the data.

For advice to prevent you from making the same mistake, contact or go to our website.

Data Horror Week

This Data Horror Week is an initiative from the RDM Support desks at Utrecht University, TU Delft, Leiden University and Twente University. For more stories, go to the Data Horror Week website. To stay up to date on all the horror stories this week, follow us on Twitter!