Data Management Plan Sample 1
Data Management Plan
The research described in this proposal will generate huge amounts of raw data in the form of digital images. Each CT scan of an individual ultimately results in three sets of files. The first set are the original angular projection images in TIFF format produced by X-rays hitting the image collector in the CT scanner (Fig. DMP-1). The second set are the cross-sectional slices in BMP format produced by the Nrecon software package (Fig. DMP-2).
From these cross-sectional image stack, the body parts of interest are segmented using Amira software. For one individual, these image files and models require 770 MB of computer disk space. Depending on the sample sizes of the Selection in the Wild study (Section IV.A), up to 4,000 individuals will be scanned for this project. The SPHARM analysis produces an additional set of files for each individual requiring 4 MB. This will require ~3.1 TB of computer disk space just to store the raw image data and models.
We archive this data on the RStor system, a secure, highly available, redundant large data storage system administered by the Dartmouth Research Computing Group, at a cost of $500/(TB·year). We currently archive all our data, including >900 digital specimens (~0.7 TB), on this system. Our current plan is to archive all files associated with each digital specimen for the next 20 years. Because we are only segmenting one of the many important body parts that are imaged in these scans, others studying damselfly morphometrics may have use for these in the future. At the end of that 20 year period we will evaluate the need of maintaining these in the archive.
We have established a standardized protocol for directory structure and directory and file naming that identifies project and each digital specimen in the archive. Directory and file names contain the species code, source lake, study ID, and a unique individual identifier. Metadata files of these codes are maintained along with the specimens in the archive.
Members of the McPeek lab (undergraduates, graduate students, postdoctoral associates, and faculty collaborators) first place digital specimens and associated files produced from analyses in a top level directory in the Northstar cluster that all lab members have read/write/modify privileges. McPeek is then solely responsible for moving these uploaded digital specimens to their final organizational position within the archive.
Our data sharing policy for these digital specimens is completely open access. We send all the files associated with digital specimens (including triangular mesh models) to anyone requesting them. We have found no appropriate public repository for deposition of this huge amount of raw digital image data. We have deposited collections of triangular mesh models for all specimens used in recent papers in the Dryad repository which are probably the most useful for other researchers. We will deposit all triangular mesh models produced from the research described in this proposal along with associated metadata in the Dryad repository. McPeek is also working on producing a web interface for accessing cross-sectional image stacks and triangular mesh models (read only) from a new server that will be part of the McPeek laboratory.
McPeek is a member of the Research Data Work Group at Dartmouth. This working group is charged by the Provost's Office with developing the data management strategies and protocols for researchers at Dartmouth. The data management plan for this project will evolve as the results from this working group are produced.