Packaging Qualitative Data for Sharing

This article introduces the concept of a qualitative data package that can be shared to comply with data requirements. It provides tips for packaging qualitative data to support research trustworthiness and transparency.

Introduction

As data sharing requirements grow, qualitative researchers are faced with the challenge of sharing the data underlying their results. Data sharing requirements are meant to ensure research integrity and trustworthiness and to enable new scholarly inquiries. To enable integrity checks and new inquiries, this guidance will focus on how qualitative researchers can produce a well-documented and organized data package that supports transparency and trustworthiness.

This guide assumes that you are at the stage of preparing your data package for sharing. It begins with the concept of a qualitative data package and accompanying materials, moving to preparing a transparent and trustworthy package, assessing the materials for sharing, and concluding with additional resources.

What is a Data Package for Qualitative Research?

The term, data package, is a framework to help you assemble, document, and share your research data and materials. This collection makes evident the data, process, and materials that you used in your research. Often, sharing just the data set is not enough for a secondary user to correctly interpret and use your materials. At a minimum, a package contains:

Evidence or data underlying your results (e.g., records, transcripts, social media tweets, NVivo or Atlas.TI project).

Documentation of your research data process, including steps for collection, preparation, and analytical coding or annotations (e.g., methodology brief, instruments, data audit trail, qualitative coding/annotation process, codebook with code/annotation definitions).

Explanation of your data structure and file(s) (e.g., case or document groups).

Description of the files in the data package, explaining file relationships, and capturing other relevant information. This file is usually called a README.

We have compiled a few exemplar data packages for archival research, qualitative interview, and mixed methods projects to illustrate the types of materials that you could include. Data packages and their contents will vary.

Archival Research	Qualitative Interview Study	Mixed Methods Research (Archival, Survey)
Textual data files (.txt, NVivo) Image files (.tif)	De-identified transcripts and memos (.txt, Atlas.TI)	Textual data (.txt, MaxQDA) Numeric data (.csv)
Annotations (.txt, .pdf/a)	Qualitative codebook with definitions, examples (.txt)	Annotations (.txt, .pdf/a) Numeric codebook (.pdf/a)
Artifact collection strategy (.pdf/a)	Schedule of questions (.pdf/a)	Archival collection strategy (.pdf/a) Survey instrument (.pdf/a)
Data use agreements (.pdf/a)	Informed consent form (.pdf/a)	Informed consent form and/or data use agreements (.pdf/a)
Methodology brief (.pdf/a)	Data collection and coding process brief (.pdf/a)	Methodology brief (.pdf/a)
README with archival sources (.txt)	README (.txt)	README (.txt)
Data license with terms of use	Data Use Agreement for full data with PPI/PHI (.pdf/a)	Data license with terms of use/access

Documenting for Transparency and Trustworthiness

Documentation of the research data process and data structure can be challenging steps for qualitative researchers to decide how much or little to include along with the time and effort to assemble. However, documentation is valuable to a future researcher to understand your data and to correctly interpret and use the data, supporting the trustworthiness of your results. A good question to ask is:

What would someone outside my research team need to know to use my data correctly?

Often, we know all the details because we collected, transcribed, cleaned, and analyzed these data. Imagine that you were bringing a new graduate student onto your team or asking a colleague not involved in this research to audit your research – what would you need to tell them to get them up to speed so they could understand your analysis and confirm the trustworthiness of your results? Documentation examples include:

Guidance for fieldwork or collection development
Schedule of questions and instructions for interviews and focus groups
Informed consent forms and approved IRB application
Data citations and attribution if data was obtained from a data producer
Licenses, terms of use, or permissions from data holders
Description of analysis method
Description of fieldwork sites and context
Description of how derived variables or files were created
Coding or annotation schemas/codebook
Researcher’s positionality statement

Assessing for Sharing

Data sharing is not binary (open or closed); there are many ways to make data available (i.e., restricted access, dark archive) and good reasons why you cannot make data completely open to the public. It is your responsibility to determine if the data has any ethical, legal, or privacy concerns and ultimately if it is appropriate to share these data and in which ways are appropriate to share these data.

As you assess your data for sharing, a few considerations include:

Does my data contain personally identifying information (PPI) and/or personal health information (PHI)? Is there any confidential or sensitive information in my data?
How likely is someone to be identified in my data? If you are looking across the variables and at combinations of variables about a participant, how likely could someone guess who this participant is?
Does my data fall under copyright or a data license from the original data producers? If you signed a research data use agreement, are there restrictions on sharing?
Does my informed consent process and forms describe data sharing?
Are there any restrictions on data use (i.e., only for academic research, big data cannot be moved) that a future user needs to be aware of?

If these questions raised any red flags for you, you might need to consider alternatives to openly sharing your data. There are many options such as de-identification, redaction, embargos, restricted access, applying terms of use, data confidentiality agreements, etc. For more information on ways to share data, please consult our guidance on Data Access Restrictions, Data Use Agreements, Sensitive Data, or Terms of Use and Licensing.

Conclusion

The guidance has informed you on how to construct a data package for qualitative research that will comply with data sharing requirements and support transparency and trustworthiness. We have compiled some additional resources below to assist you in the development of your package. The Data Curation Network (2023) has developed primers focused on preserving data types (e.g., Oral History, Twitter, Atlas. TI) that offer considerations for documentation, file formats, etc. The Qualitative Data Repository at Syracuse University provides valuable guidance on managing, preparing, and sharing data. UNC is a member of the Qualitative Data Repository (QDR), enabling UNC researchers to have access the QDR curation services. The Qualitative Data Sharing Toolkit provides guidance on planning and preparing qualitative data for sharing.

Resources

Data Curation Network. (2023). Data Curation Primers. https://github.com/DataCurationNetwork/data-primers

Qualitative Data Repository. (2022). Guidance and Resources. https://qdr.syr.edu/guidance

Qualitative Data Sharing (QDS) Project. (2023). Qualitative Data Sharing Toolkit. https://qdstoolkit.org/

CC-BY-NC

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

RDM Guidance formatting was influenced by The Writing Center, University of North Carolina at Chapel Hill Tips & Tools handouts.