The QBS MS Health Data Science Curriculum

The QBS MS concentration in Health Data Science academic curriculum has been designed to provide students with the core skills of data science including Big Data wrangling, database programming, high performance computing, data visualization, exploratory statistics, statistical modeling, and machine learning. Students will also gain valuable training communicating results in verbal, visual and written reports.

Required Core Courses

Satisfactory completion of the required core courses is as follows: an Applied Machine Learning course (QBS 108), two terms of biostatistics (QBS 120 & 121), one term of epidemiology (QBS 131), a course on Algorithms for Data Science (QBS 177), a Data Visulization and Data Wrangling course (QBS 180 & 181). All students are also required to enroll in a Capstone course (QBS 185) and have the option to pursue a summer intership after their first year.

QBS 108: Applied Machine Learning

Course Directors: Saeed Hassanpour & George Cybenko

This course will introduce students to modern machine learning techniques as they apply to engineering and applied scientific and technical problems. Techniques such as recurrent neural networks, deep learning, reinforcement learning and online learning will be specifically covered. Theoretical underpinnings such as VC-Dimension, PAC Learning and universal approximation will be covered together with applications to audio classification, image and video analysis, control, signal processing, computer security and complex systems modeling. Students will gain experience with state-of-the-art software systems for machine learning through both assignments and projects.

QBS 120: Foundations of Biostatistics I: Statistical Theory for the Quantitative Biomedical Science

Course Directors: Tor Tosteson, Zhigang Li, and Robert Frost

This is a graduate level course in statistics designed to teach the fundamental knowledge required to read and, with further study, contribute to the statistical methodology literature. An in depth overview of statistical estimation and hypothesis testing will be provided, including the method of  least squares, maximum likelihood methods, asymptotic methods,  Bayesian inference, multivariate hypothesis testing and correction for multiple comparisons, quasi and partial likelihood, M-estimation, sandwich variance, and the delta-method.  The basic elements of statistical design and sample size calculations will be introduced. Resampling strategies will be discussed in the context of the bootstrap and cross validation, as well as simulation as a tool for statistical research. The basic elements of statistical design and sample size calculations will be introduced. The emphasis will be on theory used in modern applications in biomedical sciences, including genomics, molecular epidemiology, and translational research. The course will feature computational examples using the statistical package R, but will also give students exposure to other popular statistical packages such as SAS and STATA. The course will meet for 3 hours per week.

QBS 121: Foundations of Biostatistics II: Regression

Course Directors: Tor Tosteson and Todd MacKenzie

This course covers generalized regression theory and applications as practiced in biostatistics and the quantitative biomedical sciences. The basics of linear model theory are presented, and extended to generalized linear models for binary, counted, and categorical data; regression models for censored survival data; and multivariate regression and mixed fixed and random effects regression models for longitudinal and repeated measures data.. Special topics include measurement error in regression, instrumental variables, causal inference, propensity scores and inverse propensity weighted estimation, methods for missing data. Current statistical methodologies for model selection and classification are introduced in the context of applications in genomics and the biomedical sciences. The course features computational examples using the statistical package R, with references as necessary to other statistical packages.. The course meets 3 hours per week. Most course meetings will consist of presentations and demonstrations of analytic methods using datasets from QBS projects and R or other statistical software. The final meeting will feature presentation of class projects consisting of the explanation and application of a novel regression methodology in a QBS case study.

QBS 130: Foundations of Epidemiology I: Theory and Methods

Course Director: Diane Gilbert-Diamond

This is the first of a two course sequence of graduate level epidemiology (Foundations of Epidemiology I and II). The two courses are designed to teach the underlying theory of epidemiologic study designs and analysis and prepare students for conduct of epidemiology research. Design of investigations seeking to understand the cause of human disease, disease progression, treatment and screening methods include clinical trials, cohort studies, case-cohort, case-case, nested case-control and case-control designs. Concepts of incidence rates, attributable rate and relative rate, induction and latent periods of disease occurrence, confounding, effect modification, misclassification, and causal inference will be covered in depth.

QBS 149: Mathematics and Probability for Statistics and Data Mining

Course Director: Jinyoung Byun

Optional if student tests out.

This course will cover the fundamental concepts and methods in mathematics and probability necessary to study statistical theory. Topics will include univariate and multivariate probability distributions with emphasis on the normal distribution, conditional distributions, mathematical expectation, convergence in probability and distribution, and the central limit theorem. Relevant concepts and methods from univariate and multivariate calculus will be introduced as necessary, along with related topics in linear and matrix algebra. Computational methods for statistics, including nonlinear optimization and Monte Carlo simulation will be introduced. Special attention will be given to students' active learning by programming in a statistical software package. The course will meet for 3 hours per week.

QBS 177: Algorithms for Data Science

Course Director: Chris Amos and Saeed Hassanpour

This course provides an introduction to algorithms used in data science with applications to biomedical science. The goal of this course is to present an overview of many of the approaches used for big data focusing on analytical methods.  The course assumes that students have some knowledge of R. Students will be provided with 2 large databases. Lectures on data reduction, classification, and optimization will request students complete homework for these datasets.  In addition, students will form teams at the beginning of the term and will propose a project and complete an analysis to be presented at the end of the term.  All grades will be derived from the homework and project summaries. 

QBS 180: Data Visualization and Statistical Graphics

Course Directors: Ramesh Yapalparvi and Kristen Anton

This course will teach best practices for visualizing data, including exploratory statistics and effectivecommunication of statistical analysis. Students will become competent in engaging diverse audiences in the process of analytic thinking and decision making. Topics include principles of graphic design, perceptual psychology, dashboards, dimensionality reduction, statistical smoothing and 3D graphics. Students will become competent users of Tableau, R graphics and R-Shiny.

QBS 181: Data Wrangling

Course Directors: Ramesh Yapalparvi and Eugene Demidenko

This course is a survey of methods for extracting and processing data. It will cover data architectures (ontologies, metadata, pipeline and open source resources), database theory, data warehouses, the electronic medical record, various file formats including audio, and video, data security and cloud resources. Students will gain skills working with Big Data using software such as SQL, APACHE Hadoop and Python.

QBS 185: Health Data Science Capstone

Course Director: Olga Gorlova

The capstone consists of projects completed by the students in which they bring together all aspects of data science: 1) conception of problem to solve; 2) extraction, merging and construction of analyzable data set (data wrangling) using big data; 3) exploratory statistics using principles of data visualization; 4) statistical analysis and/or machine learning of big data; 5) communication of the results, written, verbal and visually. Students will work on their capstone full-time (3 units) over the course of the summer, fall or winter quarters. Data science projects will be sought from industry as much as possible. It is anticipated that students who choose a summer internship are likely to work on a fall or winter capstone related to their internship.

QBS 186: Health Data Science Summer Internship


International students who may choose to pursue an internship during their summer quarter will need to register for this course. Successful completion of the internship will be determined by a letter provided by the student's supervisor, mentor, or manager. It is anticipated that students who choose a summer internship are likely to work on a fall or winter capstone related to their internship. [Additional Information]

Elective Courses

Satisfactory completion of up to 9 approved graduate level elective courses is required of all students. Below is the list of QBS electives and those from other departments if space allows.

QBS 100: Molecular Basis of Human Health and Disease

Course Director: Kristine Giffin and Michael Whitfield

This course is designed to solidify key cellular, molecular, and genetic concepts in the biology of human health and disease. Students in this course will develop a fundamental understanding of the molecular pathogenesis and genetic predisposition to disease, be familiar with the modern tools and technologies to study disease in model systems and human populations, and be able to read, present, and discuss primary literature on human pathobiology. Topics include normal and pathologic cellular processes, genetic and epigenetic mechanisms, and examples of major disease outcomes such as cancer.

QBS 123: Biostatistics Consulting Lab

Course Directors: Tor Tosteson and Todd MacKenzie

The goal of this course is to have students gain experience contributing to the statistical aspects of health sciences research. Students will be mentored by Biostatistics faculty members while interacting with investigators from the Geisel School of Medicine and Dartmouth-Hitchcock Medical Center who seek support from the Synergy Biostatistics Consulting Core (BCC). Course requirements will include participation in the bi-weekly BCC walk in consulting clinics, shadowing BCC staff and faculty in other statistical collaborative meetings, preparing statistical analyses, sample size calculations, reports and analytic tables and figures. Student performance will be evaluated review of student summaries of their consulting activities and by feedback surveys from BCC collaborators, faculty, and staff.

QBS 131: Foundations of Epidemiology II: Theory and Methods

Course Director: Megan Romano

Epidemiology is the science of studying and understanding the patterns of disease occurrence in human populations with the ultimate goal of preventing human disease. This graduate-level course is the second in a two-part sequence. Building off of concepts covered in the Foundations of Epidemiology I, it aims to develop an in-depth understanding of population characteristics and disease frequencies, epidemiological study designs, measures of excess risk associated with specific exposures, and inferring causality in exposure-disease relationships.

QBS 132: Molecular Biologic Markers in Human Health Studies

Course Director: Angeline Andrew

This course covers the use of human tissue samples in the context of translational research, including observational epidemiology studies and clinical trials. Lectures focus on study design, bio-specimen collection, biomarker types, kinetics and validation. Discussion will focus on examples of biomarker utilization including identifying susceptible populations, exposure assessment, molecular-genetic characterization of disease phenotype, evaluating drug compliance, monitoring dose response, testing molecularly targeted therapy.

QBS 132-2: Analysis of Human Molecular Biologic Markers

Course Director: Angeline Andrew

This computer-laboratory based course accompanies Molecular Biologic Markers in Human Health Studies and provides students with “hands on” experience with modern analytic approaches to data generated from state-of-the-art molecular studies of human tissues including many of the “omics” technologies (e.g. DNA methylation array data), and integrated analysis. Students will apply techniques for identifying and evaluating clusters and interactions. Includes application of study design principles, statistical modeling, and bioinformatical approaches.

QBS 136 & 137: Applied Epidemiological Methods I&II

Course Director: Anne Hoen

Computer laboratory-based course designed to provide hands-on experience performing epidemiological data analyses relevant to the theoretical/conceptual material presented in Foundations of Epidemiology I & II. Students will complete laboratory exercises using epidemiological study data sets that guide them through descriptive data analyses, hypothesis testing within the context of a range of epidemiological study designs, causal inference methods, and power and sample size calculations. Analyses will be performed in the open-access programming language R. Note that this is a single course spread over two quarters

QBS 146: Foundations of Bioinformatics I

Course Director: Michael Whitfield

The sequencing of the complete genomes of many organisms is transforming biology into an information science. This means the modern biologist must possess both molecular and computational skills to adequately mine this data for biological insights. Taught mainly from the primary literature, topics will include genome sequencing and annotation, genome variation, gene mapping, genetic association studies, gene expression and functional genomics, proteomics and systems biology. The course will meet for 3 hours per week.

QBS 147: Genomics: From Data to Analysis

Course Director: Olga Xhaxybayeva

Massive amounts of genomic data pervade 21st century life science. Physicians now assess the risk and susceptibility of their patients to disease by sequencing the patient's genome. Scientists design possible vaccines and treatments based on the genomic sequences of viruses and bacterial pathogens. Better-yielding crop plants are assessed by sequencing their transcriptomes. Moreover, we can more fully explore the roots of humanity by comparing our genomes to those of our close ancestors (e.g., Neanderthals, Denisovans). In this course, students will address real-world problems using the tools of modern genomic analyses. Each week students will address a problem using different types of genomic data, and use the latest analytical technologies to develop answers. Topics will include pairwise genome comparisons, evolutionary patterns, gene expression profiles, genome-wide associations for disease discovery, non-coding RNAs, natural selection at the molecular level, and metagenomic analyses.

QBS 175: Foundations of Bioinformatics II

Course Director: David Jewell

Computation is vital for modern molecular biology, helping scientists to model, predict the behaviors of, and control the molecular machinery of the cell. This course will study algorithmic challenges in analyzing biomolecular sequences (what genes encode an organism, and how are genes related across organisms?), structures (what do the proteins constructed for these genes look like, and what does that tell us about their mechanisms?), and functions (what do these things do, and how do they interact with each other in doing it?). The course is application-driven, but focused on the underlying algorithms and information processing techniques, employing approaches from search, optimization, pattern recognition, and so forth. The course will meet for 3 hours per week.

QBS 176: Methods in Statistical Genetics and Genomics

Course Director: Chris Amos, Ivan Gorlov, and Jinyoung Byun

The purpose of this course is to provide students with training in methods of statistical genetics, especially genetic epidemiology designed to identify genetic factors. This course provides instruction on tools of genetic analysis for both simple and complex diseases.  We will also instruct students in the use of association methods for gene discovery, instruction in mining and analysis of sequence data provide some discussion of data mining tools for genetic epidemiology.  The course teaches both the theory and application of some commonly used methods for linkage and association analysis. In this course we will focus on discrete traits or diseases.

QBS 194: Biostatistics Journal Club

Course Directors: Chris Amos and Jiang Gui

This is a 1 credit hour course that discusses new findings and applications in biostatistics and data science. The format comprises a monthly seminar from a faculty member, usually from an external location and presentations in the format of a journal club for the remainder of the weekly meetings. The journal club format is an informal structure in which students present one or possibly two manuscripts (if two manuscripts will be discussed they must be thematically related).

QBS 195: Independent Study

Course Directors: Arranged

Independent study in QBS is structured to allow students to explore subject matter and enhance their knowledge in QBS related fields. This independent study for QBS students will count as an elective credit and is offered during each academic term. The arrangement and a course outline is to be developed between the student and a QBS faculty member prior to the start of the term as well as approved by QBS administration. The student and faculty will work together to structure the study program and set goals that are to be met by the end of the term. The course of study may include, but is not limited to, literature review, seminar attendance, online course material, small projects, and presentations related to the specific field being studied. This can also substitute for a journal club credit after the first year.

QBS 270: QBS Journal Club

Course Directors

  • Fall: Biostatistics (QBS194) – Jiang Gui and Chris Amos
  • Winter: Epidemiology – Jennifer Emond
  • Spring: Computational Biology/Bioinformatics – Robert Frost and Saeed Hassanpour

An essential element of scientific training is in the critical analysis and communication of experimental research in an oral format. Individuals will identify a QBS faculty member who will guide them in choosing a current paper related to their research or published by their lab.  This faculty member will also advise the student on the paper presentation format and attend the journal club that week.  The presentation should include a brief discussion of the significance of the paper as well as a description of the methods used. While the presenter should be prepared to lead the discussion, members of the journal club are expected to come with questions about the paper. These questions can focus on methods, discussion, and interpretation of the results and their implications.

QBS 271: Epidemiology Graduate Seminar II: Current topics in Epidemiology

Course Director: Jennifer Emond

Student-led graduate level seminar. Students will identify and present two influential epidemiological or biomedical research studies that used different epidemiologic study designs to address a research question. Students will be encouraged to discuss and critically analyze the motivation for the studies, the research design, key findings, study limitations and study implications, and present aims for a future study which will address gaps in the research or be a clear extension of the research to date.

PH 147: Advanced Methods in Health Services Research

Course Director: Tracy Onega

This course will develop student analytic competencies to the level necessary to conceptualize, plan, carry out, and effectively communicate small research projects in patient care, epidemiology, or health services. Lectures, demonstrations, and labs will be used to integrate and extend methods introduced in other TDI courses. The course will also cover new methods in epidemiology and health services. The students will use research datasets from the Medical Care Epidemiology Unit at TDI, including Medicare data, in classroom lab exercises and course assignments. Course topics focus on key aspects observational research including risk adjustment, multilevel analyses, instrumental variables, and small area analysis. Practical skill areas will include programming in STATA, studying datasets for completeness and quality, designing tables, and figures, and data management techniques. Emphasis is on becoming independent in analytic workflow. The instructors will tutor students as they develop their own analytic projects.

PH 271: The Practice of Statistics in Medicine

Course Directors: Todd MacKenzie and James O'Malley

The aim of this course is to train student in the identification of appropriate research designs and analyses, software-aided (primarily Stata) implementation and interpretation, effective communication of results, and rigorous critique of statistical work. A combination of conceptual, technical, and illustrative explanations with examples will support learning in a seminar-style classroom environment. The course includes 10 modules, each of three weeks duration. The sequence within each module is didactic instruction and interactive illustration of how to implement analyses and interpret output over the first two weeks followed by participant presentations of research in progress or of course relevant papers and other materials in the third week.

CS 174: Machine Learning and Statistical Data Analysis

Course Director: Lorenzo Torresani

This course provides an introduction to statistical modeling and machine learning. Topics include learning theory, supervised and unsupervised machine learning, statistical inference and prediction. A wide variety of algorithms will be presented, including K-nearest neighbors, naive Bayes, decision trees, support vector machines, logistic regression, K-means, mixtures of Gaussians, principal components analysis, Expectation Maximization. The course will also discuss modern applications of machine learning such as image segmentation and categorization, speech recognition, and text processing.

ENGM 182: Data Analytics

Course Director: Geoffrey Parker

This course provides a hands-on introduction to the concepts, methods and processes of business analytics. Students learn how to obtain and draw business inferences from data by asking the right questions and using the appropriate tools. Topics include data preparation, statistical tools, data mining, visualization, and the overall process of using analytics to solve business problems. Students work with real-world business data and analytics software. Where possible, cases are used to motivate the topic being covered. Students acquire a working knowledge of the “R” language and environment for statistical computing and graphics. Prior experience with “R” is not necessary, but students should have a basic familiarity with statistics, probability, and be comfortable with basic data manipulation in Excel spreadsheets.

MATH 116: Topics in Applied Mathematics: Fundamentals in Numerical Analysis

Course Director: Anne Gelb

Many mathematical models arising in various applications cannot be solved analytically. This course teaches fundamentals of numerical analysis,including a brief overview of numerical linear algebra, root finding methods, interpolation and approximation, and methods for solving ordinary differential equations. The course will focus on how numerical algorithms are constructed and analyzed in terms of their accuracy, efficiency, and stability. Students will use MATLAB to demonstrate the validity and/or failure of various approaches in different situations.

MATH 126: Current problems in Applied Mathematics

Course Director: Feng Fu

Partial differential equations (PDEs) are essential for the modelling of physical phenomena appearing in a variety of fields from geophysics and fluid dynamics to geometry. In this course, we will study three major topics one should understand when modelling with PDEs. The topics are: (i) the theory (e.g. existence and uniqueness of solutions)> (ii) when and how can solutions be found analytically> (iii) classic numerical techniques (e.g. finite difference and finite element methods) and how to determine if the method is stable and convergent. In addition, we will discuss the limitations of existing solution techniques in the context of open research questions.

Degree Requirements

Requirements for a the Masters of Science (MS) in Quantitative Biomedical Sciences (QBS) with a Concentration is Health Data Science

Health Data Science students have access to interdisciplinary courses positioning individuals to have competitive advantages for careers in Big Data, healthcare and biomedicine that translate to academia and industry. Students complete 9 required courses, including a capstone that brings together data wrangling, exploratory data analysis, programming, statistical learning, epidemiology, data visualization and communication. In addition, up to 9 elective courses are required during the 5 quarters in residence. Students are encouraged to pursue an internship during the summer, which extends their last session from the fall quarter to the winter quarter.

  1. Satisfactory completion of the following courses:

    Mathematics & Probability for Statistics & Data Mining
    Foundations of Biostatistics I
    Foundations of Biostatistics II
    Machine Learning
    Data Wrangling
    Data Visualization
    Epidemiology I
    Algorithms for Data Science

  2. Satisfactory completion of 6 approved graduate level elective course

  3. Completion of mandatory first year ethics course required of all first year graduate students

*4+1 Students must satisfy the degree requirements listed above but must complete 3 courses from the list during their Dartmouth undergrad training

For details of required coursework see: Curriculum