This article describes the Data Curation service offered by the UNC Research Data Management Core (RDMC) and how to request for this service.
What is Data Curation?
Data curation refers to the various processes involved in ensuring that data can be discovered, accessed, understood, and used now and into the future. These processes are part of a disciplined practice that considers the technical aspects of data as an object for long-term archival preservation and access--but within the broader context of responsible and ethical conduct of research, scientific rigor and integrity, disciplinary culture and practice, and stakeholder mandates and expectations.
The practice of data curation comprises an exhaustive list of activities that are applied to dataset files based on data type, file format, domain, original and intended uses, and other important factors that determine how data files should be organized, documented, archived, and shared in a trustworthy data repository. The diagram below illustrates a high-level overview of a typical data curation workflow.
Data curation is a set of data management activities that should be considered when developing data management and sharing plans. When planning for data management, it is important to consider the provisions needed for curation, especially for data that are large in size or volume, require specialized hardware or software, contain sensitive information (i.e., protected health information (PHI), personally identifiable information (PII)), or otherwise need specialized support.
Why curate data?
Data curation is a disciplined practice that is essential for the long-term accessibility and usability of research data. It not only anticipates inevitable changes in technology that are likely to make it impossible to open a file in a few years' time, but also it is sensitive to changes in how research is done. Even if data are housed in an established data repository, there are no guarantees that anyone, including you, will be able to use or even understand them if the data are not documented sufficiently. The video below offers a light-hearted (but very real) portrayal of why data curation should be incorporated into the research process.
https://youtu.be/N2zK3sAtr-4?feature=sharedThe data curator
A data curator is responsible for the execution of the data curation workflow in accordance with standards and best practices for long-term preservation and access. The data curator is a professional who has specialized education (typically a graduate degree in information and library science) and training in archival principles and practice, digital preservation, electronic records management, information organization and retrieval, and other related topics. They also have experience working in various research settings to understand how these areas of study are applied to throughout the research lifecycle from project planning to publication.
When planning for data management and sharing, a data curator will consider several aspects of the data within the context of the responsible conduct of research. Along with technical requirements for data handling and archival storage, data curators take into account data collection and analysis methods, informed consent requirements, standardized scientific metadata schemas, data quality control protocols, potential reuses of the data, and other critical matters for proper data management.
Data curation standards and best practices
The RDMC Data Curation service was designed to align with prevailing standards and best practices for research data management and sharing including those listed below.
Data Curation Lifecycle Model. The lifecycle model emphasizes the holistic nature of data curation practice and its necessary application throughout the research process from project conceptualization to data sharing.
FAIR Principles. Originally published in Science in 2016, the FAIR Principles for findable, accessible, interoperable, reusable data have been promoted by major funding agencies as a set of recommended guidelines for ensuring access to scientific data.
10 Things for Curating Reproducible and FAIR Research. This document describes key issues for data curation practices that aim to ensure that the published findings of quantitative data-driven research can be computationally reproduced.
OAIS Reference Model. The Reference Model for an Open Archival Information System (OAIS) is an ISO standard (ISO 14721) that outlines recommend practices for digital archive organizations and the requirements of the systems they support.
CURATE(D). The Data Curation Network, which is a membership organization of data repositories, offers a checklist of standard data curation steps for publishing high-quality data.
RDMC Data Curation Service
The UNC Research Data Management Core (RDMC) offers a Data Curation service to the UNC research community. This RDMC Add-on Data Management Service is designed to ensure that datasets housed in data repositories are discoverable, understandable, and usable. Members of the RDMC research data stewardship team work with investigators and project teams to develop and execute data curation workflows for data file packaging and repository ingest that align with standards and best practices for data preservation and data quality.
Data Curation service scope of work
The scope of work for the RDMC Data Curation service for most projects will include the following primary data curation activities:
Dataset file preparation. Assembly and review of dataset files and associated materials to ensure that the data package submitted to the designated repository upholds FAIR data principles for findability, accessibility, interoperability, and reusability.
Dataset record creation. Creation of standardized descriptive and administrative metadata based on client-provided information about the data types, allowable uses, and research context.
Dataset file transfer. Upload of dataset files to a dataset record. Includes creation of file-level metadata, checksums review, application of access restrictions (if required), and inspection of files in the repository to confirm successful transfer.
For other data management and sharing needs that fall outside of this scope of work for the Data Curation service, please visit the RDMC Add-on Services webpage for information about other services offered by RDMC that may meet project needs.
Requesting the Data Curation service
Requests for the RDMC Data Curation service can be submitted via the RDMC Services website.
Detailed information about your project allows the RDMC team to determine what services you need and accurately estimate costs based on those needs. Detailed information about the funding sponsor and their data management and sharing policies allows the RDMC team to recommend services that the sponsor considers allowable costs and that meet the sponsor’s specific data management and sharing requirements.
When submitting requests for the Data Curation service, please be prepared to provide the following information:
Program solicitation URL
Data sharing policy URL
Project proposal draft
Data management and sharing plan
Project start and end date
Estimated total project budget
Estimated budget for data curation
Once the RDMC team reviews submitted materials, you will receive a quote for the Data Curation service that includes a recommended scope of work. Please note that this quote is offered as a broad-brush scope of work estimate based on several assumptions about project needs and timelines. If these assumptions are inaccurate or if the scope of work changes, the associated fees will be updated.
Budgeting for the Data Curation service
Data curation may be considered by funding agencies to be an allowable cost that can be included in the proposal budget. Costs for data curation varies based on several factors including the type, volume, complexity, and sensitivity of the data.
Review funding agency policies for guidance on allowable costs for data management. If you have questions about whether or not a service is considered an allowable cost, please contact OSPHelp@unc.edu for assistance.
Data Curation service fees are based on hourly recharge rates approved by the UNC Office of Sponsored Programs. Please visit the RDMC Add-on Services webpage for current rates and hourly minimums.
The text below describing the RDMC Data Curation service may be used in the budget justification component of the project budget as applicable.
The RDMC Data Management and Curation service fee covers the cost of executing standards-based data curation workflows for data file packaging and repository ingest, which includes file format normalization, document preparation, metadata generation, dataset package quality review, and transfer of dataset packages to the specified repository. These curation activities are carried out by RDMC data curation specialists with the specific needs of the data type, format, size, and content in mind to ensure that the data are findable, accessible, interoperable, and reusable (i.e., FAIR), and to satisfy all DMSP requirements.
To learn more about the RDMC Data Curation service or to obtain a quote, please submit your request via the RDMC Services website.