based on a presentation by Athena Nomikos Haplotypes are groups of closely linked alleles that tend to be inherited together. Because they are not separated by recombination, haplotypes can be used to map genes very accurately. Some segments of the ancestral chromosomes occur as regions of the DNA sequence that are shared by multiple individuals, regions where the chromosome has not been broken by recombination. If a particular haplotype contains a disease allele, then subsequent recombination events will always carry the diseased allele along with the closest SNP in a group, allowing for identification of the conserved region.
IV. Breast Cancer Susceptibility V. Genotyping Individuals for SNPs
Genetics 144: Oncogenomics
Dartmouth Medical School
Course Director: Charles Brenner, Ph.D.
January 23, 2006
Please note: Online subscriptions necessary to view figures
I. Single Nucleotide Polymorphisms
Across the population, two unrelated individuals share more than 99% of their DNA sequence. The remaining less than 1% contains genetic variations that make us unique. Among the variations are alterations that affect susceptibility to disease and environmental conditions. A single nucleotide polymorphism (SNP) refers to a variation in a single nucleotide (A,T,C,G) in the DNA sequence of the genome (Figure 1). For a variation to be considered a SNP, it must occur in at least 1% of the population. SNPs make up 90% of all genetic variation, occurring every 100-300 bases along the genome and two out of every three SNPs involve a cytosine to thymine mutation.

SNPs can occur in both coding and noncoding regions, with noncoding SNPs arising due to sporadic mutations and occuring at different frequencies (or not at all) in different geographic populations. Because SNPs do not only occur in coding regions of the genome, they may not necessarily have an effect on cell function. However, they can predispose individuals to a particular disease or affect drug response or metabolism. Understanding these variations can lead to understanding the causes of the disease as well as possible treatment therapies.
In April 1999, the U.K. Wellcome Trust, in collaboration with ten pharmaceutical companies, established The SNP Consortium (TSC) to find and map common SNPs. The goal was to produce a publicly available genetic map using SNPs as markers evenly distributed across the human genome. The database now allows users to search by gene or SNP keyword, as well as find SNP allele frequency data and download SNP linkage maps (1). The ultimate goal will be to use these linked SNP haplotypes as possible genetic markers for disease predisposition.
II. Disease Susceptibility: Low Penetrance Alleles and Linkage Analysis
Penetrance is the likelihood that a given genotype will result in a phenotype. The term low penetrance alleles refers to the idea that all alleles do not contribute equally to a disease phenotype. These low penetrance alleles may therefore not be as distinguishable in the population, either because they are not as frequent or because phenotypic effects are difficult to see, but they can still lead to disease susceptibility.
Similarly, the polygenic theory, along with the Common Disease/Common Variant (CDCV) Theory, claims that many commonly occurring diseases are caused by commonly occurring alleles (2). However, these alleles may not impact reproductive fitness because of incomplete penetrance. Because of this, these low penetrance alleles may actually be more prevalent in the population than high penetrance alleles.
The goal of linkage analysis is to ulitmately find positions coding for or linked to human disease states on the genome based on known locations of genetic markers, DNA sequences with different sizes and sequences in the population. This is done by measuring the recombination of different markers in the genome, taking into account that frequent recombination will most likely occur between disease and marker alleles that are far apart while no recombination will occur between disease and marker alleles that are close together or linked.
Linkage disequilibrium (LD), which results from recombination in the ancestral chromosomes, is a measure of a correlation, co-segregation, or association between a genetic marker and disease. LD is determined by the deviation of the haplotype frequencies in a population from the values they would have if the alleles at each locus were combined randomly (3).
Problems with using linkage analysis to determine genetic predispostion include decreased penetrance, chance clustering and sporadic mutations being common in the population, and also genetic heterogeneity among individuals. It is this last point that makes it important to carry out case-control SNP studies in a genetically isolated population, testing for regions of the genome in which one specific founder haplotype is significantly more frequent in disease cases than in controls.
III. Haplotype Association
In 2002, the HapMap Project was launched. HapMap is an international effort to identify and catalog genetic similarities and differences as well as to identify chromosomal regions where genetic variants are shared. The initial phase of the project has collected DNA samples from 270 people of African, Asian, and European descent, looking to find haplotypes with frequencies of 5% or higher in a population. The HapMap Project identifies SNPs in DNA samples from multiple indviduals. Adjacent SNPs that are inherited together, and therefore thought to be linked, are compiled into haplotypes and used as tags in the identification of conserved regions. In this manner, the goal is to identify regions containing disease alleles as well as alleles that predispose individuals to risk from a particular environmental factor or medication. The project aims to determine what these variants are, where they occur on the genome, and how they are distributed among people within a population. (4).
Most breast cancer (approximately 90-95%) is sporadic. However, of the breast cancer cases that appear to have a strong hereditary component, a large percentage (almost half) are due to inherited mutations in the tumor suppressor genes BRCA1 and BRCA2 with subsequent loss of heterozygosity in the tumor. The BRCA1 gene is found on chromosome 17 and mutations involve changes in 1 or more DNA base pairs or the rearrangement of large segments of DNA. The common BRCA1 mutations are BRCA1*187delAG and BRCA1*5183insC. BRCA2 is a gene located on chromosome 13 and mutations commonly involve insertions or deletions, such as BRCA2*6174delT. In both cases, mutations in BRCA1/2 lead to an abnormally short protein product that does not function properly. Upon DNA damage (either endogenous or exogenous), normal functioning BRCA1 and BRCA2 relocalize to areas of damage and repair the lesion, resulting in a normal cell cycle (5). For more information on BRCA mutations and possible therapeutic approaches, please refer to a presentation by Samuel Bakhoum.
In 2003, Kennedy et al. (6) described a method for genotyping SNPs in 3 different human populations without locus specific primers and identifying those with significant allele frequency differences between groups. They first digested the total genomic DNA and identified fragments of desired sizes, excluding those with repetitive sequences. They then prepared samples using a single oligonucleotide primer for amplification and discriminated alleles with synthetic DNA microarrays, a method called "Fragment Selection by PCR" (FSP). The arrays where designed to look at only the SNPs that had been predicted to be amplified, through previous biochemical assays, and also predicted to be present on those fragments. In total, fifty-six probes were synthesized for each SNP in order to take into account varying positions on the genome and to afford more accuracy in location determination. Probes were synthesized for both the sense and the antisense strands, as well as for different positions of the two SNP alleles. Using an algorithm to calculate discrimination values for the different SNPs, those with a significant measure of sequence specificity were selected.
Ellis et al. (7) then used this list of SNPs, along with that generated by Matsuzaki et al. (in a similar manner to the Kennedy paper, 8), to genotype 8,576 SNPs in the Ashkenazi Jewish population. The Ashkenazis are presumed to be established from a relatively small number of founders and are therefore considered a genetically isolated population. This means that one of the Ashkenazi founders presumably carried the BRCA2*6174delT mutation, and this mutation, along with any closely linked SNPs, was passed on through subsequent generations as a haplotype with no intervening recombination. Ellis et al. looked at individulas containing the BRCA2*6174delT breast cancer mutation gene as well as kindred and unrelated Askhenzis without the BRCA2 gene mutation in order to determine if any SNPs were mapped (or linked) to the BRCA2 region. Table II of the paper shows the mean and median distances between SNPs on each chromosome. This data shows that the SNPs being studied are not distributed evenly among the genome. Figure 1 of the paper shows that the SNPs found to have a significant P value of occurrence in individuals with the BRCA2 mutation map close to the BRCA2 region. This SNP location remains statistically significant even after correction for multiple tests (Figure 1B of paper). Similarly, Figure 2 of the paper shows allele frequencies and p values for SNPs found in control and disease case individuals. The arrow indicates the location of the BRCA2 mutation and it can be seen that SNPs near this region have different frequencies in the different experimental groups. For example, SNP TSC1378449 has a frequency of 0.09 in the healthy controls and a frequency of 0.5 in the diseased individuals. Ellis et al. also looked at genotype frequencies in kindred and healthy controls but did not find statistically significant SNPs that were localized only around the BRCA2 region, instead the SNPs occurred throughout the entire genome (Figure 3). The same was true when trying to detect novel susceptibility loci (Figure 4), no significant SNPs were found near the BRCA2 region.
VI. Conclusions
Because SNPs are evolutionarily stable, they are easy to follow in population studies. Furthermore, because there are an estimated 3 billion SNPs along the human genome, pharmaceutical companies are attracted to the huge financial prospects that SNP discovery and technology may offer. Determing linkage analysis between inherited SNPs and known disease alleles may ultimately lead to advances in both diagnosing and treating disease state predispositions. Ellis et al. attempted to use genome-wide LD mapping to identify novel genes, hypothesizing that disease causing mutations and surrounding genes would be close to identical in the Ashkenazi population due to descent from a common founder. While they did determine that some known SNPs were localized near the BRCA2 region, they were unable to determine, with any significance, new SNP locations in diseased versus healthy individuals. Of course, even in this relatively genetically isolated population, genetic heterozygosity still posed problems in the experimental design. The low penetrance of the disease allele, along with the small number of experimental subjects, was another problem. In this case, 27 diseased individuals were studied but in order to find true SNPs (which must, by definition, be present in at least 1% of the population) a larger group of subjects is required. The authors also estimated that while they used only 8,756 SNPs in their study, 500,000 would really be required to efficiently map low-penetrance disease alleles.
VII. References
1. Thorisson, G.A., and Stein, L.D. The SNP Consortium website: past, present and future. Nucleic Acids Res. 2003 31 (1): p. 124-7.
2. Smith, D.J., and Lusis, A.J. The allelic structure of common disease. Hum MolGen, 2002, 11 (20):p.2455-61.
3. Crawford, D.C., Akey, D.T., and Nickerson, D.A. The patterns of natural variation in human genes. Annu Rev Genomics Hum Genet., 2005, 6: p. 287-312.
4. Altshuler, D., Brooks, L.D., Chakararti, A., Collins, F.S., Daly, M.J., and Donnelly, P. International HapMap Consortium. A Haplotype Map of the Human Genome. Nature, 2005. 437 (7063): p. 1299-320.
5. Yoshida, K., and Miki, Y. Role of BRCA1 and BRCA2 as regulators of DNA repair, transcription, and cell cycle in response to DNA damage. Cancer Sci, 2004 95 (11): p.866-71.
6. Kennedy, G.C., et al. Large-scale genotyping of complex DNA. Nat Biotechnol, 2003 , 21: p. 1233-1237.
7. Ellis, N.A., et al. Localization of Breast Cancer Susceptibility Loci by Genome-Wide SNP Linkage Disequilibrium Mapping. Genetic Epidemiology, 2006, 30: p. 48-61.
8. Matsuzaki, H., et al. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high density oligonucleotide array. Genome Res, 2004, 14: p.414-425.
9. Goldstein, D.B., and Cavalleri, G.L. Genomics: Understanding Human Diversity. Nature, 2005, 437(7063):p. 1241-2.