Data Curation

This article defines data curation, explains its necessity and importance, highlights the role of a data curator within a research project team, and points to resources on data curation best practices and standards.

What is Data Curation?

Data curation refers to the various processes involved in ensuring that data can be discovered, accessed, understood, and used now and into the future. These processes are part of a disciplined practice that considers the technical aspects of data as an object for long-term archival preservation and access--but within the broader context of responsible and ethical conduct of research, scientific rigor and integrity, disciplinary culture and practice, and stakeholder mandates and expectations.

The practice of data curation comprises activities that are applied to dataset files based on data type, file format, domain, original and intended uses, and other important factors that determine how data files should be organized, documented, archived, and shared in a trustworthy data repository. The diagram below illustrates a high-level overview of a typical data curation workflow.

2024-10-14_21-37-48-20241015-013750.png
Overview of a typical data curation workflow

Data curation is a set of data management activities that should be considered when developing data management and sharing plans. When planning for data management, it is important to consider the provisions needed for curation, especially for data that are large in size or volume, require specialized hardware or software, contain sensitive information (i.e., protected health information (PHI), personally identifiable information (PII)), or otherwise need specialized support.

Why Curate Data?

Data curation is a disciplined practice that is essential for the long-term accessibility and usability of research data. It not only anticipates inevitable changes in technology that are likely to make it impossible to open a file in a few years' time, but is also sensitive to changes in how research is done. Even if data are housed in an established data repository, there are no guarantees that anyone, including you, will be able to use or even understand them if the data are not documented sufficiently. The video below offers a light-hearted (but very real) portrayal of why data curation should be incorporated into the research process.

https://youtu.be/N2zK3sAtr-4?feature=shared

The Data Curator

A data curator is responsible for the execution of the data curation workflow in accordance with standards and best practices for long-term preservation and access. The data curator is a professional who has specialized education (typically a graduate degree in information and library science) and training in archival principles and practice, digital preservation, electronic records management, information organization and retrieval, and other related topics. They also have experience working in various research settings to understand how these areas of study are applied throughout the research lifecycle from project planning to publication.

When planning for data management and sharing, a data curator will consider several aspects of the data within the context of the responsible conduct of research. Along with technical requirements for data handling and archival storage, data curators take into account data collection and analysis methods, informed consent requirements, standardized scientific metadata schemas, data quality control protocols, potential reuses of the data, and other critical matters for proper data management.

Data Curation Standards and Best Practices

To learn more about prevailing standards and best practices for research data management and sharing, please review the following resources:

Data Curation Lifecycle Model
The lifecycle model emphasizes the holistic nature of data curation practice and its necessary application throughout the research process from project conceptualization to data sharing.

FAIR Principles
Originally published in Science in 2016, the FAIR Principles for findable, accessible, interoperable, reusable data have been promoted by major funding agencies as a set of recommended guidelines for ensuring access to scientific data.

10 Things for Curating Reproducible and FAIR Research
This document describes key issues for data curation practices that aim to ensure that the published findings of quantitative data-driven research can be computationally reproduced.

OAIS Reference Model
The Reference Model for an Open Archival Information System (OAIS) is an ISO standard (ISO 14721) that outlines recommend practices for digital archive organizations and the requirements of the systems they support.

CURATE(D)
The Data Curation Network, which is a membership organization of data repositories, offers a checklist of standard data curation steps for publishing high-quality data.

 

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

 

RDM Guidance formatting was influenced by The Writing Center, University of North Carolina at Chapel Hill Tips & Tools handouts.