What will you be producing?
NSF's list of items
Types of data
- Observational data captured around the time of the event
- Examples: Sensor readings, telemetry, survey results, neuroimages
- Usually irreplaceable
- Experimental data from lab equipment
- Examples: gene sequences, chromatograms, toroid magnetic field readings
- Often reproducible, but can be lengthy and expensive
- Simulation data generated from test models
- Examples: climate models, economic models
- Models and metadata (inputs) more important than output data.
- Reproducible, but possibly expensive
- Derived or compiled data
- Examples: text and data mining, compiled database, 3D models
- Reproducible, but possibly expensive
- Samples and other non-digital data forms
- Samples, physical collections, notebooks
- All may be considered data for the purposes of presenting a dmp
How are you generating the data?
||e.g. ASCII, Word, PDF
||e.g. ASCII, SAS, Stata, Excel, netCDF, HDF
||e.g. MySQL, MS Access, Oracle
||e.g. JPEG, TIFF, Dicom, MPEG, Quicktime
||e.g. 3D VRML, X3D
||e.g. Java, C
||e.g. FITS in Astronomy, CIF in Chemistry
||e.g. Varian NMR data format, LeCroy digital oscilloscope format.
Are you using a sustainable digital format - one that is compatible, for the foreseeable future, with software needed to open and read it?
Will these file types be long-lived?
Are there tools or software you will need to process or view the data that need to be archived along with the data?
How fast will the data be growing?
Where will the data be stored?
- Personal computer
- Cloud storage
- Lab server
What about backups (or copies of) your data?
Frequency - How often?
Location(s) of backups or file copies - Office, building, off-site
What kind of system or software - College backup (NetBackup), Retrospect, Online: Mozy or Carbonite
Testing procedures - will you test the restore process to make sure backups are working correctly.
What are levels of data?
Raw data -> Cleaned data -> Processed data -> Summary Level data -> Publication data
Metadata. Information about the data.
What will you keep?
How long will you keep the data?
What are the procedures envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence.
How will you document your data?
Is there good project and data documentation?
What directory and file naming convention will be used?
Will you be using versioning controls?
What metadata will you include to describe contextual information about who collected/created it, the date, instrument settings, etc.
- Name of the dataset or research project that produced it
- Names and addresses of the organization or people who created the data
- Number used to identify the data, even if it is just an internal project reference number
- Keywords or phrases describing the subject or content of the data
- Organizations or agencies who funded the research
- Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule
- Where the data relates to a physical location, record information about its spatial coverage
- How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook
- Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed
- List of file names
- List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
- File formats
- Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data
- File Structure
- Organization of the data file(s) and the layout of the variables, when applicable
- Variable List
- List of variables in the data files, when applicable
- Code Lists
- Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')
- Date/time stamp for each file, and use a separate ID for each version (see organizing your files)
- To test if your file has changed over time.
Is there an ontology or other community standard for data sharing/integration?
Who will assign the metadata?
Data quality issues
Are there any data quality issues?
How will the (technical) quality of the data be assured?
How will adherence to this data management plan will be checked or demonstrated?
Audience: Who are the potential secondary users of the data?
How will data be made available for public use and secondary uses?
What are my options for sharing?
- Discipline based repositories
- Institutional repositories
- Websites - www.dartmouth.edu account, departmental server, hosted server space
- Cloud (Amazon, RackShare, Google, etc)
- Restricted use collections
Are there any embargo periods?
What kind of rights will be granted to different user groups?
Who will decide on access to the data?
Include explanations about how data may be re-used and how the source of the data should be acknowledged
What are the plans for preserving data in accessible form?
Any sharing requirements? e.g. funder data sharing policy
If there are partnerships, how will data be shared and managed with partners?
Are there privacy issues
Protected personal information: medical (HIPPA), student information (FERPA)?, other?
Other confidentiality concerns
Will you have physical restrictions, like firewalls or off-network devices?
Will you or the institution have policies that restrict access or enforce security measures?
Who will make decisions about data security?
How the data management plan will maximize the value of the data?
IMPACT: What is the possible impact of the data within the immediate field, in other fields, and any broader, societal impact?
What about transfer of people or data?