Public datasets hosted by DBIC

Recently, the world-wide neuroscience community has experienced a rise in data sharing efforts aimed at increasing transparency and reproducibility of experimental findings.  As a result, many collated and annotated functional imaging datasets have become publicly available. Access to such resources provides investigators and students a great opportunity to explore and test models of brain function and cognition. However, many datasets are very large and can be challenging for individuals to access. To facilitate easy access and promote the use of publicly available data resources at DBIC, we host several datasets on the Dartmouth Research Computing file system where they are readily available to DBIC users for analyzing on the Discovery cluster.

List of hosted datasets

Contact

    For questions about existing datasets or requests to add new datasets, please contact Jamie Ford (James.C.Ford@dartmouth.edu)

Dartmouth brain imaging Center (DBIC) QA dataset

The Dartmouth Brain Imaging Center's QA data, converted to BIDS standard, can be accessed using the DataLad file-sharing and data-versioning utility (see: https://www.datalad.org/).

  • Path to DBIC QA data on Rolando: /inbox/BIDS/dbic/QA
  • Access: Open (no restrictions to rc-DBIC users)

Online resources

Narratives dataset

The Narratives Dataset is a large collection of fMRI studies in natural language processing conducted at Princeton Neuroscience Institute by the labs of Professors Ken Norman and Uri Hasson and procured by DBIC alumnus Dr. Sam Nastase. The dataset consists of normalized 3T fMRI data collected while subjects (345 individuals) listened to various spoken word narratives with 28 unique stories and over 800 scanning sessions. This dataset is a benchmark for testing models of language processing and language comprehension.

  • Path to the Narratives dataset on Discovery: /dartfs/rc/lab/D/DBIC/DBIC/archive/narratives
  • Access: Open (no restrictions to rc-DBIC users)

Online resources

NATURAL SCENES DATASET

The Natural Scenes Dataset (NSD) is a large ultra-high-field 7T fMRI dataset conducted by the Center for Magnetic Resonance Research at the University of Minnesota. It consists of high-resolution fMRI data collected from 8 adult subjects while they viewed thousands of images of natural scenes. Each subject was scanned multiple times with 30 to 40 sessions per subject. The NSD dataset is an excellent resource for testing models of visual representation and cognition.  

  • Path to the NSD on Discovery: /dartfs/rc/lab/D/DBIC/DBIC/archive/NSD.
  • Access: Data Use Agreement (DUA) and group limited access permission

Online resources

The Human connectome project

The Human Connectome Project  (HCP) is a massive collaborative undertaking led by PIs Dr. David Van Essen of Washington University and Dr. Kamil Ugurbil of the University of Minnesota. The DBIC currently hosts the HCP 1200 Young Adult dataset which contains resting state fMRI data from over 1100 healthy adults ages 22-35. 

  • Path to HCP dataset on Discovery: /dartfs/rc/lab/D/DBIC/DBIC/archive/HCP/HCP1200
  • Access: Data Use Agreement (DUA) and group limited access permission

Information about the directory structure and file names can be found HERE.

Online resources

Healthy brain network 

The Healthy Brain Network (HBN) is a large 3T fMRI dataset conducted by the Child Mind Institute for developing human brains (5-21 yrs) The whole study includes eye tracking, EEG and rich demographic and behavioral data that can be found here. The data hosted by DBIC only includes the fMRI data. It consists of subjects under resting-state, eye-tracking calibration and watching two short movie clips ("Despicable Me" and "The Present").   We used the version from "Reproducible Brain Charts (RBC)" and the 845 subjects passing the quality check by the RBC team (up to "Release 9" by the HBN team). The paper on "Reproducible Brain Charts" can be found here. In additional to the downloaded raw BIDS data and Freesurfer 6 outputs,  custom processing derivatives with fmriprep and nb_prep were stored.

  • Path to HBN dataset on Discovery: /dartfs/rc/lab/D/DBIC/DBIC/archive/HBN
  • Access: Creative Commons (CC license) (open with restrictions)
    • The HBN data uses a Creative Commons license with "BY-NC-SA" restrictions.
      • BY (attribution): citation requirements on eventual publications
      • NC (non-commercial)
      • SA (share-alike): if you reshare you must keep the CC restrictions

Online resources

AMSTERDAM OPEN MRI COLLECTION (AOMIC)

The Amsterdam Open MRI Collection (AOMIC) is a collection of three independent, open-access datasets—known as ID1000, PIOP1, and PIOP2—collected at 3T and totaling over 1,300 unique participants (N=928, N=216, N=226, respectively). Participants received T1-weighted, diffusion-weighted (DWI), and fMRI imaging. The functional data in the largest component, ID1000, is from movie watching of natural scenes by 19-26 year olds, while the two PIOP datasets include resting state and task-based fMRI of university students, with the latter targeting emotion matching and working memory (both), face perception, cognitive control, and emotion anticipation (PIOP1 only), and response inhibition (PIOP2 only). Demographics and psychometric variables are also included.

      • Path to the AOMIC datasets on Discovery: /vast/labs/DBIC/datasets/Amsterdam-Open-MRI
      • Access: Open (no restrictions to rc-DBIC users)

Online Resources

  • Snoek, L., van der Miesen, M.M., Beemsterboer, T. et al. The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses. Sci Data 8, 85 (2021). https://doi.org/10.1038/s41597-021-00870-6 

BISSETT SELF REGULATION DATASET

This dataset includes neuroimaging and behavioral data collected to examine the general construct of self-regulation. It consists of anatomical MRI, resting-state fMRI, and task-based fMRI (Stroop and Stop-signal) along with various self-report surveys from 103 healthy adult subjects.

    • Path to dataset on Discovery: /vast/labs/DBIC/datasets/Bissett-Self-Regulation
    • Access: Open (no restrictions to rc-DBIC users)

Online Resources

  • Bissett, P.G., Eisenberg, I.W., Shim, S. et al. Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation. Sci Data 11, 809 (2024). https://doi.org/10.1038/s41597-024-03636-y 

Individual Brain Charting

The Individual Brain Charting (IBC) dataset is a high-resolution 3T fMRI dataset collected at NeuroSpin, CEA Saclay, France. It consists of extensive functional imaging data from 12 participants who each underwent multiple scanning sessions. Paradigms span over 50 task conditions spanning cognitive domains such as visual perception, language, comprehension, and working memory. Participants also completed multiple naturalistic movie watching sessions. The IBC dataset is particularly valuable for studying individual variability in brain organization and for developing high-precision functional atlases. Two versions are installed: /IBC (EBRAINS, most complete) and /IBC_openneuro (more strictly BIDS compliant and standardized, slightly fewer sessions).

Path to dataset on Discovery:  /dartfs/rc/lab/D/DBIC/DBIC/archive/Individual-Brain-Charting
Access: Open (no restrictions to rc-DBIC users)

Online resources

The Moth

The Moth dataset is a naturalistic 3T fMRI dataset conducted at University of Texas, Austin by the lab of Professor Alexander Huth. The dataset contains densely sampled fMRI data from 8 participants that listened to naturalistic narratives from The Moth Radio Hour. Functional localizers are also available for all participants. This dataset often serves as a benchmark for encoding model performance in predicting language-associated brain activity.

  • Path to the dataset on Discovery: /dartfs/rc/lab/D/DBIC/DBIC/archive/ds003020
  • Access: Open (no restrictions to rc-DBIC users)

Online resources

103 tasks dataset

The 103 Tasks dataset is a deep phenotyping dataset that includes 18 scans collected over 3 days from from 6 participants (age 22-33) in Osaka Japan. Each day included 6 scans during which the full battery of tasks was completed. These include visual, auditory, motor, language, memory and introspective tasks. Each scan includes 77-83 tasks, and pairs of scans provide the full complement, thus providing both repeated measures within and across days within participant. Data is collected at 3T using a gradient-echo multiband EPI sequence with a 2s TR and 2x2x2mm voxels. T1 images are also provided. Relative to other dense phenotyping datasets this dataset has relatively poor signal quality, but in exchange is especially task rich within scan and session allowing for characterization of high dimensional neural representations without scan or session confounds.

  • Path to the dataset on Discovery: 
    /vast/labs/DBIC/datasets/103tasks
  • Access: Open (no restrictions to rc-DBIC users)

Online Resources

  • Nakai, Tomoya and Nishimoto, Shinji (2020). "Quantitative models reveal the organization of diverse cognitive functions in the brain". Nature Communications 11(1). doi:10.1038/s41467-020-14913-w

ImageNet database

ImageNet is a large visual database (14M images) used to train visual object recognition and localization models. Images are categorized into 1000 symantic sets ("synsets") which serve as object category labels. This dataset is used as part of the "ImageNet Large Scale Visual Recognition Challenge" (ILSVRC) and the data 2012 version of the dataset has seen particular use in neuroAI applications for training feedforward neural network models for comparison with the ventral visual system and other brain networks. It is particularly well suited for training very deep neural networks like ResNets. This dataset includes a lot of small files and is stored as a tarball to avoid poor performance over NFS storage. Please copy to local scratch space (/scratch) and extract there for optimal (orders of magnitude) better performance on HPC systems. A script is provided demonstrating how this can be done efficiently.

  • Path to dataset on Discovery: /vast/labs/DBIC/datasets/ImageNet
  • Access: All users must read the ImageNet terms of access (https://www.image-net.org/download.php) and credit both the below citations.
    • Group limited access permission: Email Jamie Ford

Online Resources

  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei. Imagenet: A Large-Scale Hierarchical Image Database. CVPR 2009. bibtex
  • Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014. paper | bibtex