NSF Data Management Plan Checklist
What will you be producing? - Types of Data
Observational data captured around the time of the event
- Examples: Sensor readings, telemetry, survey results, neuroimages
- Usually irreplaceable
Experimental data from lab equipment
- Examples: gene sequences, chromatograms, toroid magnetic field readings
- Often reproducible, but can be lengthy and expensive
Simulation data generated from test models
- Examples: climate models, economic models
- Models and metadata (inputs) more important than output data.
- Reproducible, but possibly expensive
Derived or compiled data
- Examples: text and data mining, compiled database, 3D models
- Reproducible, but possibly expensive
Samples and other non-digital data forms
- Samples, physical collections, notebooks
- All may be considered data for the purposes of presenting a data management plan
Other Data Examples
- Digital Data
- Software
- Samples
- Curricular Materials
- Physical Collections
File Types
Text: e.g. ASCII, Word, PDF
Numerical: e.g. ASCII, SAS, Stata, Excel, netCDF, HDF
Database: e.g. MySQL, MS Access, Oracle
Multimedia: e.g. JPEG, TIFF, Dicom, MPEG, Quicktime
Models: e.g. 3D VRML, X3D
Software: e.g. Java, C
Domain Specific: e.g. FITS in Astronomy, CIF in Chemistry
Vendor Specific: e.g. Varian NMR data format, LeCroy digital oscilloscope format.
Where will the data be stored? - Data Storage
Personal computer
Cloud storage
Lab server
ThayerFS
Webserver
rSTor
Data Backup
Frequency - How often?
Location(s) of backups or file copies - Office, building, off-site
What kind of system or software - College backup (NetBackup), Retrospect, Online: Mozy or Carbonite
Testing procedures - will you test the restore process to make sure backups are working correctly.
Levels of Data
What are levels of data?
- Raw data -> Cleaned data -> Processed data -> Summary Level data -> Publication data
- Metadata. Information about the data.
How long will you keep the data?
What are the procedures envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence.
How will you document your data?
Is there good project and data documentation?
What directory and file naming convention will be used?
Will you be using versioning controls?
Metadata
Title: Name of the dataset or research project that produced it
Creator: Names and addresses of the organization or people who created the data
Identifier: Number used to identify the data, even if it is just an internal project
reference number
Subject: Keywords or phrases describing the subject or content of the data
Funders: Organizations or agencies who funded the research
Dates: Key dates associated with the data, including: project start and end date;
release date; time period covered by the data; and other dates associated with the
data lifespan, e.g., maintenance cycle, update schedule
Location: Where the data relates to a physical location, record information about
its spatial coverage
Methodology: How the data was generated, including equipment or software used, experimental
protocol, other things one might include in a lab notebook
Sources: Citations to material for data derived from other sources, including details
of where the source data is held and how it was accessed
List of file names: List of all data files associated with the project, with their
names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
File formats: Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software
required to read the data
File Structure: Organization of the data file(s) and the layout of the variables,
when applicable
Variable: ListList of variables in the data files, when applicable
Code Lists: Explanation of codes or abbreviations used in either the file names or
the variables in the data files (e.g. '999 indicates a missing value in the data')
Versions:Date/time stamp for each file, and use a separate ID for each version (see
organizing your files)
Checksums: To test if your file has changed over time.
What are my options for sharing? - Data Sharing
- Self-dissemination
- Discipline based repositories
- Institutional repositories
- Websites - www.dartmouth.edu account, departmental server, hosted server space
- Cloud (Amazon, RackShare, Google, etc)
- Restricted use collections
Privacy & Security
- Protected personal information: medical (HIPPA), student information (FERPA)?, other?
- National security?
- Patent related
- Other confidentiality concerns
- Informed consent
Other
How the data management plan will maximize the value of the data?
IMPACT: What is the possible impact of the data within the immediate field, in other fields, and any broader, societal impact?
What about transfer of people or data?