Costs of data management

To help you estimate the costs of data management an overview of possible costs per research phase and research activity is presented. 

 

1. Estimate the costs of data management 

To help you estimate the costs which are involved in making your research data findable, accessible, interoperable and reusable (FAIR), have a look at the overview of possible costs per research phase and research activity below.

The content is based on the Data Management Costing Tool, developed by the UK Data Service.

I. Costs for data collection
A. Acquiring external datasets
Question to consider Estimated costs Tips
Do you plan to use existing (commercial or open) data?  Example: A faculty license on a database for macro-economic analyses costs approximately €18.000 per year. 

- Your library may be able to help you acquire a license to a crucial database.

- In research data repositories, data can be available at no or low costs.

 

B. Granularity of data
Question to consider Estimated costs Tips
Do you collect data on the same level of detail that you want / will be able to process? Example: If collecting half the data is enough, costs for transferring, collecting, storing etc. will also go down by half. 

It can be tempting to collect as much data as you can. However if you collect per second, and already plan to average that to daily, you’d possibly have reliable information sampling hourly, or even less.

You can do a pilot, to assess the granularity you need to answer your research question, or consult a statistician to calculate how much data you need.

 

C. Formatting and organising
Questions to consider Estimated costs Tips

Are your data files, spreadsheets, measurements, interview transcripts, records etc. all stored in a uniform format or style, clearly named with unique file names and well organised? 

Per project, organising the style, format and names can be done by a student assistant at level 1 salary (~17 euro per hour) or a data manager at level 2 salary. (~60 euro per hour). 

If you plan data formats and data organisation beforehand by developing templates and data entry forms for individual data files (transcripts, spreadsheets, databases) and by constructing clear file structures, low or no additional cost will apply. If you have to develop these afterwards, higher costs are involved.

 

D. Transcription
Questions to consider Estimated costs Tips

- Will you transcribe qualitative data (e.g. recorded interviews or focus group sessions) as part of your research?

- Or will you need to do this specifically so data can be more easily shared and reused?

- Is full or partial transcription needed?

- Is additional hardware /software needed ?

- Is translation needed?

- Will you need to develop a standard transcription template or transcription guidelines to ensure consistent formatting?

Example: Time needed for transcription is four to eight hours per hour recording. Vist the transcribing calculator to estimate the needed time for your project. 

- If you embed transcription as part of your research practice, very low or no additional cost will apply. 

- Consider the costs of (the time needed for) developing procedures, templates and guidance for transcribers.

 

E. Consent for data sharing
Question to consider  Estimated costs Tips
Do you need to ask participants for their consent for data sharing?  Gaining informed consent can be done by a student assistant at level 1 salary (~17 euro per hour) or data manager at level 2 salary (~60 euro per hour). 

- When consent for data sharing is considered as part of standard consent procedures early in research, very low or no additional cost apply.

- When participants need to be recontacted or revisited to obtain active consent, high costs may apply, e.g. because of extra preparation of information sheets and consent forms, consent discussions or training of interviewers. 

 

F. Data transfer
Questions to consider Estimated costs Tips
- Are special measures needed to transfer data from mobile devices, from fieldwork sites or from home equipment to a central work server?

- Is software or hardware needed for encryption before data transfer or for synchronisation of data files across sites?

Free encryption or data transfer software (i.e. SURfilesender) is available in most cases.

- See the storage solutions of Utrecht University for more information on SURFfilesender. 

- Utrecht University has developed BoxCryptor for encryption.  

 

II. Costs for data documentation
A. Data description and metadata
Questions to consider Estimated costs Tips

- Is data in a spreadsheet, database or data warehouse clearly marked with variables, variable labels and value labels, code descriptions, missing value descriptions, etc.?

- Are validated questionnaires and standard coding used? Are labels consistent?

- Are files, records and items in the collection clearly described with well-defined metadata or a metadata standard to interpret the relations between them and to quickly select and understand the content?

- Does textual data like interview transcripts need a description of the context, e.g. included as a heading page?

Examples:

- 4 hrs per single experiment (120 measurements) filling in 60 required metadata fields, with assistance of a data manager at level 2 salary (~60 euro per hour).    

- According to UK Data Archive, two to three weeks are costed into an average two year research grant application to prepare and collate materials for deposit.  

- If data description is carried out as part of data creation, data input or data transcription, low or no additional costs will apply. 

- If data descriptions needs to be added or harmonised afterwards, higher costs are involved.

- Codebooks for datasets can often be easily exported from software packages. 

 

B. Documentation
Description  Estimated costs Tips
Do you have documentation for the data that describes the context and methodology of how data was gathered, created, processed and quality controlled? Researcher at level 2 salary ( ~60 euro per hour). 

- Often, essential contextual and methods documentation will be written up in publications and reports. 

- If all data creation steps are well documented and documentation is kept well-organised during research, low or no additional costs will apply.
 
- If  documentation needs to be written or compiled afterwards, higher costs are involved. 

 

III. Costs for data storage
A. Data back-up
Questions to consider Estimated costs Tips

- How frequently should back-ups be done and how many back-ups should be stored?

- Does your institution provide regular back-up or not?

Examples:

- University drive €0.80 per GB/year. 

- Cloud: €0.30 per GB/year.

- 2 x Harddrive: €0.14 per GB (single purchase).

- Institutional back-up is often included in standard indirect cost/overheads.

- Cost for additional back-up will depend on the number of copies to be kept, frequency of back-up and required storage media. 

 

B. Data storage
Questions to consider Estimated costs Tips

- How much data storage space is needed for the entire duration of your project?

- Do you need to set up a data model and accompanying database for the data?

- Do you need a data warehouse or a database architect?

Example:

- Cloud database as a service: €160/month (storage 5GB, transfer 30GB).
 
- Database architect at level 2 salary  (~60 euro per hour). 

 

- Insitutional storage is often included in standard indirect cost/overheads.

- Costs for additional storage could include server or disk space, as well as the costs of setup and maintenance.

 

IV. Costs for data access and security
A. Data access
Question to consider Estimated costs Tips

- Do external people require access to research data?

- Does remote access via VPN or secure FTP need to be arranged for external people?

Often, researchers can use (free) existing services.

At Utrecht University you can use SURFfilesender or SURFdrive. See 'Storage solutions'.  

 

B. Data security
Questions to consider Estimated costs Tips

- Should you protect data against unauthorised access or disclosure? 

-  Is an institutional server available where you can store your data safely?

- Can security be arranged by institutional IT services or is extra software/hardware needed?

- Do your data files need encryption before storage or transfer? 

Example:

- TTP (trusted third party), dependent on pseudonymisation type, ca. €1.000- €30.000.

- Existing encryption services could be used at no costs. 

For confidential or privacy-sensitive data, determining the conditions for controlling access to shared data may require extra time and discussion. See the guide to handling personal data.

 

V. Costs for data preservation
A. File format
Questions to consider Estimated costs Tips
- Does data need to be converted to a standard or open format with longterm validity for long-term preservation?

- Is additional software or hardware needed for conversion?

Researcher at level 2* salary (~60 euro per hour). 

For audiovisual data, converting to open digital formats can be time-consuming or require special equipment and/or software for databases. Also, conversions may require checking for truncation, loss of metadata or annotation, loss of relationships, etc.

 

VI. Costs for data availability and reuse
A. Anonymisation
Questions to consider Estimated costs Tips

- Do you need to remove identifying information or conceal the identity of participants (e.g. using pseudonyms) before data can be shared?

- Have you considered measures to ensure that anonymisation is consistent throughout data collection? 

Example:

Transcribing and simultaneously anonymizing audio (speech):

- Up until one hour per 5 minute fragment (depending on the preciseness level of transcribing). 
- Student assistant at level 1 salary  (~17 euro per hour). 
- Free software is available. 

- If anonymisation is planned before data collection or transcription/digitisation, lower costs will apply.

- Anonymising audiovisual data, voices or faces can be very costly and could reduce the usefulness of data.

- For quantitative data (e.g. survey data) cost can be kept low if identifiers are a priori excluded from data files, easy to remove or coded to avoid disclosure. Costs may be higher if variables need recoding afterwards to avoid disclosure.

- For qualitative textual data (e.g. interview transcripts) costs can be reduced if anonymisation is carried out during transcription (or at least highlighted/coded during transcription).

- Costs depend on how sensitive or complex data is and how much identifying information is recorded in the data. If only removal of names is required, costs are low; pseudonymisation, however, will require more time.

 

B. Copyright
Questions to consider  Estimated costs Tips

- Do other parties hold copyright in the data?

- Do you need to seek copyright clearance before sharing data?

- Is legal advice required? 

Juridical advice at level 3* salary (external expert, ~160 euro per hour).

Seeking clarity in advance will make sure you don't jeopardise the progress of your research later on.  

 

C. Data sharing
Questions to consider  Estimated costs Tips

- Will your data be deposited with a data centre or institutional repository?

- Which requirements exist to prepare data to particular standards e.g. regarding documentation or format?

- Does structured metadata need to be created when data is shared via a data centre or archive, e.g. completing a deposit form for 4TU.ResearchData or DANS? 

- Which data will or will not be retained, and for how long?

Examples:

- Completing a data repository upload form (i.e. via 4TU.ResearchData or DANS) may take 15 min to 4 hrs;

- Dryad €110 once (max 20 GB)

- DataverseNL €3.60 per GB/year

- Cloud database as a service: €160 /month (storage 5 GB, transfer 30 GB). 

- A public repository/data centre/data journal can provide you with the possibility to share your data for reuse. To prepare data for sharing and preservation, find out what data deposit and/or longer-term storage costs per year (in time and effort).  

- Data centres will have their own metadata forms. Consider using these during your research. 

 

D. Data cleaning
Questions to consider Estimated costs Tips

- Does quantitative data need to be cleaned, checked or verified before sharing, e.g. to check the validity of codes used or check for anomalous values?

- Will data match documentation, e.g. same number of variables, cases, records, files?

- Does textual information in data need to be spell-checked?

- Do you need to combine your data with other data sets for your research?

Examples:

According to DataSopic, a data cleaning service costs from €270 to well over €1800.

- Research/data manager at level 2 salary (~60 euro per hour).

- Data cleaning takes time.

- If you carry out data clearning as part of data entry and preparation (before data analysis), low additional costs will apply. 

 

E. Digitisation
Questions to consider Estimated costs Tips

- Does analogue or paper-based research data (maps. newspaper clippings, photographs, images, text) need to be digitised to increase their potential for sharing?

- Is additional equipment or software needed for scanning or conversion?

Example:

Digitisation €0.50 per page (few pages) or €320-390 per 1000 pages (Optical Character Recognition (OCR) included).

- If simple image scanning of text is sufficient, the costs will be relatively low.

- If OCR is required with manual checking for accuracy (revising entire scanned text), the costs may be high.

- If manual data entry or typing is needed, e.g. to digitise tabular data, the costs may be high. 

 

VII. Overall
A. Operationalising data management
Questions to consider Estimated costs Tips

- What measures are needed to implement and operationalise data management?

- Do you need extra time and resources to implement data management throughout your research, e.g. regular team meetings, setting up a collaborative research environment?

- Do you need a dedicated data manager? 

- Do you need staff training?  

- Do you need to allocate roles and responsibilities for various data management activities?

- Data manager at level 2* salary (~60 euro per hour). 

- Travel costs, lunch, time.

If multiple partner institutions, researchers or funders are involved in your research project, consider the costs of data management planning meetings or discussions.

Writing a data management plan in itself will cost you about two hours to two days, depending on the complexity of your project. It is time well-spent because early planning of data management (especially when preparing for a funding application) can significantly reduce the costs.
Research Data Management Support

    2. Costs eligible for funding

    Most funders consider the costs for data management eligible for funding. Already in the proposal phase most research funders ask you to explicitly think about the (costs for) management and publication of your research data, both during and after your research project. Your Research Support Office offers a Research Funding Toolkit with an overview of the specifics per funder (login with your SolisID).  

    If you have questions about funder requirements, have a look at the contact details of the Research Support Offices on the UU intranet.
    If you have questions about the content of your Data Management Plan, have a look at the 'Guide: Data management planning' or contact us right away.