Documentation - Codebook
This article provides instructions and tips for creating a codebook (sometimes referred to as a ‘data dictionary’) that describes the contents of a research data file or files.
What is a Codebook?
A codebook is a document that describes an individual data file or a collection of data files. The codebook should answer all questions a user might have when looking at your data file. It should be organized in such a way that users can understand which data file(s) is being described and should include variable names, variable labels, values, and value labels for all variables.
What Should be Included in a Codebook/Data Dictionary?
In general, a codebook/data dictionary should include the following information:
Title of the project and contact information for PI(s).
For projects with multiple data files, each section of the codebook should be labeled with the data filename it is describing. For example: Section 1. Datafile1.dta; Section 2. Datafile_country.dta.
All variables in the data file(s) are listed with the variable name used in the data file accompanied by the variable label or description of the content.
For example: Variable Name | Variable Label
gender | Whether or not the respondent is female
All values and value labels are fully described, including null or missing values
For example: Variable Name | Variable Label | Values & Value Labels
gender | Whether or not the respondent is female | male = 0; female = 1; missing = N/A
Information on the structure of the data, as needed. For databases, provide table names, keys, and their relationships (e.g., ER diagram). For text file, provide formatting information (e.g., first row is variable names, tab or comma delimited).
Any additional information necessary for understanding the content of the data file(s) such as standard measures used or data source citations.
A codebook should provide a complete understanding of the contents of your data file(s) so that a secondary user can easily compare it to the data without question.
What Format is Recommended for a Codebook?
We recommend providing the codebook as a text document saved as a PDF for long-term preservation and access.
How to Use Statistical Software to Generate a Codebook
Stata and R both include functions that can be used to generate much of the variable information that should be included in your codebook. The ability of these functions to output useful information, however, relies on the presence of variable information and value labels built into the data file. Prior to executing these commands, double check that each variable is assigned a label and that all values for those variables are defined in the dataset.
Stata Example:
* Using the codebook command in Stata
https://www.stata.com/manuals/dcodebook.pdf
Syntax:
codebook [ varlist ] [ if ] [ in ] [ , options
* Using the labelbook command in Stata
https://www.stata.com/manuals/dlabelbook.pdf
Syntax:
labelbook [ lblname-list ] [ , labelbook options ]
R Example:
# Using the codebook package in R
# Arslan, R. (2020, June 6). Package ‘codebook’. Version 0.9.2. CRAN Repository. https://cran.r-project.org/web/packages/codebook/
Syntax:
codebook(
results,
reliabilities = NULL,
survey_repetition = c("auto", "single", "repeated_once", "repeated_many"),
detailed_variables = TRUE,
detailed_scales = TRUE,
survey_overview = TRUE,
missingness_report = TRUE,
metadata_table = TRUE,
metadata_json = TRUE,
indent = "#"
)
# Using the dataMaid package in R
# Petersen, A., & Ekstrøm, C. (2019, December 10). Package ‘dataMaid’. Version 1.4.0. CRAN Repository. https://cran.r-project.org/package=dataMaid
Syntax:
makeCodebook(data, vol = "", reportTitle = NULL, file = NULL, ...)
Additional Guidance
Codebooks and data dictionaries come in a variety of formats. There is no single standard, and some disciplines have their own nuances of a codebook. The most important thing to remember is that a codebook is used by secondary users to fully understand the contents of a data file, so providing that information within your codebook is key.
The following resources may help you format your own codebook:
Social Sciences: https://www.icpsr.umich.edu/web/ICPSR/cms/1983
Health Sciences: https://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/CodebookCookbook/CodebookCookbook.pdf
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
RDM Guidance formatting was influenced by The Writing Center, University of North Carolina at Chapel Hill Tips & Tools handouts.