FAIR USE NOTICE. This document contains copyrighted material whose use has not been specifically authorized by the copyright owner. The CHANCE project is making this material available as part of our mission to promote critical thinking about statistical issues. We believe that this constitutes a `fair use' of the copyrighted material as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes of your own that go beyond `fair use', you must obtain permission from the copyright owner.
Can DNA typing uniquely identify the source of a sample? Because any two human genomes differ at about 3 million sites, no two persons (barring identical twins) have the same DNA sequence. Unique identifica-tion with DNA typing is therefore possible provided that enough sites of variation are examined.
However, the DNA typing systems used today examine only a few sites of variation and have only limited resolution for measuring the variability at each site. There is a chance that two persons might have DNA patterns (i.e., genetic types) that match at the small number of sites examined. None-theless, even with today's technology, which uses 3-5 loci, a match between two DNA patterns can be considered strong evidence that the two samples came from the same source.
Interpreting a DNA typing analysis requires a valid scientific method for estimating the probability that a random person might by chance have matched the forensic sample at the sites of DNA variation examined. A judge or jury could appropriately weigh the significance of a DNA match between a defendant and a forensic sample if told, for example, that "the pattern in the forensic sample occurs with a probability that is not known exactly, but is less than I in 1,000" (if the database that shows no match with the defendant's pattern is of size 1,000).
To say that two patterns match, without providing any scientifically valid estimate (or, at least, an upper bound) of the frequency with which such matches might occur by chance, is meaningless.
Substantial controversy has arisen concerning the methods for estimating the population frequencies of specific DNA typing patterns.[1-14] Questions have been raised about the adequacy of the population databases on which frequency estimates are based and about the role of racial and ethnic origin in frequency estimation. Some methods based on simple counting produce modest frequencies, whereas some methods based on assumptions about population structure can produce extreme frequencies. The difference can be striking: In one Manhattan murder investigation, the reported frequency estimates ranged from 1 in 500 to I in 739 billion, depending on how the statistical calculations were performed. In fact, both estimates were based on extreme assumptions (the first on counting matches in the databases, the second on multiplying lower bounds of each allele frequency). The discrepancy not only is a question of the weight to accord the evidence (which is traditionally left to a jury), but bears on the scientific validity of the alternative methods used for rendering estimates of the weight (which is a threshold question for admissibility).
In this chapter, we review the issues of population genetics that underlie the controversy and propose an approach for making frequency estimates that are independent of race and ethnic origin. This approach addresses the central purpose of DNA typing as a tool for the identification of persons.