Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

What is This Guide About?

This guide will identify the main differences between cloud storage, data collection tools, and data repositories with regards to long-term preservation, discoverability, and data sharing. It will further demonstrate uses for each during a research project and will provide resources for learning more about platforms used by many UNC researchers.

Definitions

  • Cloud Storage is a method of storing digital files on multiple distributed servers typically managed by a host service (e.g., Amazon, Google, OneDrive, Box, etc.). The host service manages and maintains the physical servers and ensures the digital files stored are accessible and secure.

  • Data collection tools offer a variety of features, but the common thread between them is that users can collect, manage, and store their research data through the life of their project. Some tools let users manipulate the data within the platform, while others allow users to export data in multiple formats used by common statistical software. Examples of data collection web-based tools include Qualtrics, REDCap, and Google Forms.

  • A data repository is a platform built specifically for preserving and sharing research data. Data repositories provide persistent, stable identifiers (e.g., DOIs, handles, etc.) to enable continued access to the metadata related to the data files. Additionally, a data repository has mechanisms in place for ensuring the long-term preservation and storage of its holdings (e.g., multiple copies and backups, file migrations). Periodic data integrity audits are performed to verify the quality of the data files being stored within the repository. Data repositories are designed to promote discoverability and access of research data by generating robust metadata following data archival best practices and community standards.

Differences Between Each

Each tool is useful for specific purposes, but no single tool is good for all things. It is important to recognize these differences and uses when crafting a data management and sharing plan. Below is a table breakdown highlighting the features and major differences between cloud storage, data collection tools, and data repositories.

 

Cloud Storage

Data Collection Software

Data Repositories

File storage

Yes – provides storage for files up to certain sizes depending on the platform selected; however, may not provide long-term storage depending on platform and/or user agreement (i.e., data may be deleted after a certain period of time or inactivity)

Yes – provides storage for files up to certain sizes depending on the platform selected; however, may not provide long-term storage depending on platform and/or user agreement (i.e., data may be deleted after a certain period of inactivity)

Yes – provides long-term storage with multiple copies and backups for files up to a certain size depending on the platform. Review the repository’s sustainability plan or preservation policy for specifics on preservation and storage

Preservation formats

(i.e., Open source and/or software agnostic file formats that have been determined by the data archiving community to be stable and most likely to work in the future.)

No; data are stored as they are uploaded to the cloud

No; data are stored as they are collected in the platform

Yes/No – some data repositories create preservation formats for specific data file types (ex: UNC Dataverse)

Persistent identifiers

No; not appropriate for compliance with most funders’ data sharing requirements

No; not appropriate for compliance with most funders’ data sharing requirements

Yes – provides a persistent, stable identifier to enable access of data; if data are removed from repository, the persistent identifier will still resolve to the metadata page for those data

Discoverability

No; not appropriate for compliance with data sharing requirements

No; not appropriate for compliance data sharing requirements

Yes/No – trustworthy repositories provide robust metadata in compliance with standards and best practices that allow users to search and discover research data online; however, some repositories may not use standardized metadata. Review the repository’s submission guidelines and digital preservation policies to determine if they utilize standard metadata

Data analysis

No

Yes – Some data collection software offers the ability to analyze collected data within the platform

Yes/No – A few data repositories link to external tools for data analyses, but most repositories do not include data analysis software

 

Checksums and/or File audits

Yes – cloud storage services typically offer some form of monitoring or audit service that tracks the files stored and accessed within the cloud. This is to ensure the security and integrity of the files

Yes/No – some data collection software uses cloud storage under their platform, so they may offer audit and monitoring services to determine what happened to a file. This depends on the platform, so review the features and services offered before making your decision

Yes/No – as part of their commitment to data preservation, a data repository should generate file checksums and perform audits on their holdings to identify discrepancies or degradations in their holdings; if issues are discovered, they should have backups of their holdings to ensure files are preserved in their original format. Before submitting to a repository, check their digital preservation policies and guidelines to determine if they perform routine audits and backups of their preserved holdings

File format exports

No, usually you get back what you put in

Yes/No – some data collection software allows users to export data in multiple formats

Yes/No – Some data repositories permit users to export data in multiple file formats, but not all repositories offer this feature

Cloud Storage Uses in Data Management and Sharing

Cloud storage services are useful for storing backup copies of data files and documents during the active phase of your research. They are also useful for transferring and sharing files among project team members.

In most cases, cloud storage is not a sufficient mechanism for preserving and providing long-term access to data files and documentation. There may be some use cases where cloud storage options can be used to facilitate secure data transfers for sensitive data, but these instances are few and depend heavily on the security level of the cloud storage combined with the secure transfer protocols they may offer. Please consult with UNC ITS or your department IT to determine options for transferring and storing sensitive data.

UNC ITS Storage Offerings

Data Collection Software Uses in Data Management and Sharing

Data collection software such as REDCap or Qualtrics are useful for crafting and fielding surveys, collecting, organizing, and cleaning data, and exporting data for analysis during the life of your research project. Data should not be stored for the long-term in data collection platforms as it is not assigned a persistent identifier, nor is it made accessible or available for discovery by secondary users.

As part of a data management and sharing plan, a process for preparing the data for deposit into a data repository should be noted. Once the project has completed, the person(s) responsible for managing and sharing the data should export the files from the data collection software, prepare them for deposit, and ingest them into the data repository identified in the DMS plan for long-term preservation, discovery, and access.

Some of the data collection and management tools used at UNC are:

REDCap

Qualtrics

CDART

Data Repository Uses in Data Management and Sharing

Data repositories are used primarily in preserving and sharing those data underlying the results from a research project. Upon deposit, the repository will generate a persistent unique identifier that will always resolve to the metadata record for those data. Publicly available data should always be de-identified with sensitive protected health information (PHI) and/or personal identifying information (PII) stripped out to protect the confidentiality of research participants. Data shared in a data repository is typically stable with only one version; however, some data may be updated over time and therefore new versions can replace the originals. A data repository should have some form of version history available to track these changes over time and to increase transparency of the evolution of the data files.

Not all data can or should be shared in a data repository. If your research data is too sensitive or has legal or ethical limitations on sharing, please consult with data repository staff about alternative options. Some data repositories may permit researchers to create metadata records without data to facilitate discovery and access requests for sensitive restricted data. Other repositories may have a more secure method for preserving and sharing data outside of the public-facing data repository. Request a consultation with staff to determine which option best suits your project needs.

Unless mandated by the funding agency, UNC researchers may choose a repository appropriate for sharing their data. We recommend looking for a domain-specific repository first; however, in the event a domain-specific repository does not exist, a generalist data repository may be selected.

Here are a few generalist repositories to consider:

UNC Dataverse (for UNC researchers)

Zenodo

Dryad

Mendeley Data  

References

About – REDCap. (n.d.). Retrieved August 25, 2023, from https://projectredcap.org/about/

UNC ITS. (n.d.). Storage Offerings. Retrieved October 14, 2024, from https://tdx.unc.edu/TDClient/33/Portal/KB/ArticleDet?ID=270

Wikipedia contributors. (2023, July 25). Cloud storage. In Wikipedia, The Free Encyclopedia. Retrieved 12:42, August 25, 2023, from https://en.wikipedia.org/w/index.php?title=Cloud_storage&oldid=1167077140

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

RDM Guidance formatting was influenced by The Writing Center, University of North Carolina at Chapel Hill Tips & Tools handouts.

  • No labels