!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 9.06

May 7, 2000 to June 6, 2000

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou and Joan Snell.

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

http://www.dartmouth.edu/~chance

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

===========================================================

Number theory is pathological randomness

                                                                                          Bob Brooks

===========================================================

Contents of Chance News 9.06

All over the world the key indicator to a man of a woman's fertility is the relationship of her hip measurement to that of her waist. A ratio of 0.7 is deemed ideal.

The Observer 27 February 2000          

 

One in 100 jobless - the unemployment rate in the 11-nation European Union was steady in September at 10%, representing 12.8 million unemployed people.

Cape Argus (Capetown) 10 November 1999

Gilbert Bukenya, MP for Busiro North, was last night re-elected chairman of the Movement parliamentary caucus. Sources that attended the closed meeting said Bukenya scored 137 votes while his opponent Chris Mudoola got 37. Some of the 150 MPs at the meeting did not vote.

New Vision (Uganda) 24 February 1000           

                                        <<<========<<



>>>>>==============>
Rex Boggs at Glenmore State High School in Rockhampton, Queensland, Australia, maintains an interesting web site to assist teachers of the basic statistics courses. This web site

http://exploringdata.cqu.edu.au

contains activities, datasets and assessments to support data exploration. It also contains articles on topics of interest to statistics teachers, some of which grew out of statistical discussion groups. This web site has an elegant "presentation". We enjoyed poking around in it. We particularly enjoyed finding the transcript of a lively interview with Jane Watson on the Australian National radio about the need for statistical literacy.
                                        <<<========<<



>>>>>==============>
Here are two interesting forthcoming conferences.

Beyond the Formula IV

This will be the fourth "Beyond the Formula" conference. These conferences are held every year at Monroe Community College. They deal with the first statistics course, covering: Curriculum, Teaching techniques, Technology, and Applications. While all areas are covered each year, one area is emphasized. Last year's conference emphasized technology (See Chance News 8.07). This year's conference, to be held August 3-4, emphasizes Curriculum with main themes: statistical reform, evaluating reform, and the role of probability in introductory statistics. As in previous years, this year's conference has an impressive list of speakers including, among other, Richard Scheaffer, Joan Garfield, Robert Hogg and Kyle Siegrist. More details and registration information can be found at:

http://www.monroecc.edu/depts/math/beyond1.htm.
                                        <<<========<<



>>>>>==============>
The Second International Research Forum on Statistical Reasoning, Thinking and Literacy (SRTL2) This is the second international forum to be offered by the IASE Statistical Education Research Group (IASE SRG). We attended SRTL1, held last summer at Kibbutz Be'eri, Israel, and it was a wonderful experience (see Chance News 8.07). SRTL2 will be held at the University of New England, Armidale, Australia, August 15-20, 2001. This forum is being organized by Chris Reading, Joan Garfield, and Dani Ben-Zvi. You can find out more information about SRTL1 and how to apply to attend SRTL2 at: http://www.beeri.org.il/srtl.
                                        <<<========<<



>>>>>==============>
Our colleague Dan Rockmore appears regularly on Vermont Public Radio in a series of commentaries that tell listeners where to look for mathematics in their everyday lives. Previous topics have included; Math in the Movies, The Rare Beauty of Nine, Mathematics of the Millennium, and, most recently, "Getting Lost to get Found".

"Getting Lost to get Found" was suggested by the attempt by two of Dan's friends, Peter Doyle and Bob Drake, to find a missing graveyard in a neighboring town. Peter attributed their lack of success to his being unable to carry out a truly random walk, which would be expected to cover all points. This led Dan, in his commentary, to explain the famous Polya theorem that a random walker will cover all points in one and two dimensions but not in three dimensions. Dan included the story of how Polya came to consider this problem. This is truly a romantic story and you can hear it by listening to Dan's most recent commentary at:

http://hilbert.dartmouth.edu/news.
                                        <<<========<<



>>>>>==============>
Tom Moore wrote us that one of his students, Emile, wondered what is the probability that a home run will be hit into one of the gloves on top of two of the new ballparks. Referring to an article "Whose got the best new ballpark?" in the March 27, 2000 issue of Sports Illustrated, Emile writes:

At Pacific Ball Park in S.F., there is a fiberglass glove in left field that measures 26 feet high and 32 feet wide. However, it is 518 feet from home plate. At Comerica Park in Detroit, they have a large statue of Al Kaline with him holding a glove up in the air. This one is about at centerfield, perhaps about 20 or 30 feet to the left field side (Editor: 395 feet from home plate).

DISCUSSION QUESTION:

How would you answer Emile's question?
                                        <<<========<<



>>>>>==============>
Joan Garfield suggested the following article and provided the write-up for it.

Word for word/junk food psychology; Triscuits and cheez doodles as windows into the soul.
The New York Times, January 30, 2000, Sec. 4, P. 4
Tom Kuntz

"THERE'S junk science, and then there's junk-food science."

This line caught the attention of a colleague of mine who gave me this article to read. It describes the research of Dr. Alan R. Hirsch, a neurologist and psychiatrist who runs the Smell and Taste Treatment and Research Foundation in Chicago. It was not surprising that this study, which links junk food to personality types, was commissioned by the Snack Food Association and the National Potato Promotion Board.

According to Hirsch, our preferences for different types of snacks can provide insight into our personality and character structure. He even believes that "through our food preferences and choices we reveal inner thoughts, feelings, wishes and desires" and "that, in many respects, the foods we choose provide a window to the unconscious." Thus, food preference could theoretically be used as a projective personality test.

This idea intrigued my students who tried to cross validate (using themselves as subjects) the results of Hirsch's correlational study (n=800) which related personality and depression test scores to preferences for six different snack foods: potato chips, tortilla chips, pretzels, snack crackers, cheese curls and meat snacks.

Here are Hirsch's descriptions of people who prefer the different types of snacks:

POTATO CHIPS -- Those who love potato chips are ambitious, successful, high achievers. They enjoy rewards and trimmings of their success -- both in business and family life. They feel pride and happiness when their spouse and children are also successful. They seek nothing less than the best in those around them. Potato chip lovers are easily frustrated and indignant at life's inconveniences. . . .

Note: Whether in business, sports or a social situation, expect to lose if you enter into competition with a potato chip lover; they are worthy and prepared adversaries.

TORTILLA CHIPS -- Tortilla chip lovers aren't satisfied with getting a grade of A, it must be an A+. Their concern extends beyond their own actions, and also to the community at large -- they are distressed by the inequities and injustices of society. This concern for how others feel would make them an ideal house guest.

PRETZELS -- Lively and energetic, those who prefer pretzels crave novelty and easily become bored by the usual routine. They are excited by the challenge -- whether it be at work, sports or home. They thrive in the world of abstract concepts and tend to lose interest in the mundane day-to-day world. There is a tendency to initiate new projects without having completed the last, and to over commit to work or family chores.

Happy in the role as a "flirt," pretzel lovers are comfortable dressed in an attractive manner. . . . A welcome addition to any grouping -- they are "the life of the party.". . .

SNACK CRACKERS -- Contemplative, thoughtful . . . shy and introspective, snack cracker eaters avoid confrontation so as not to hurt the feelings of others. They have many diverse interests and are involved in a multitude of projects simultaneously, all competing for their time and attention. They value their private time and are most creative when allowed to be alone, free from their daily responsibilities and interruptions.

Note: Those who prefer crackers may easily find themselves romantically involved in an Internet relationship.

CHEESE CURLS -- Formal, always proper, conscientious, principled, the cheese curl lover maintains the moral high ground with . . . family, work and romantic partners. They have a fine sense of right and wrong and justly treat those with whom they interact -- the C.E.O. is treated with the same fairness and concern as the bus boy.

Rather than showing reckless disregard for the future, they plan ahead, anticipate any possible future catastrophes. Whether it be Band-Aids or batteries, the cheese curl lover's house is stocked and ready.

Note: If it is so spotless you can eat off the kitchen floor, you are in the domicile of one who craves cheese curls.

MEAT SNACKS -- Gregarious, social . . . They are generous to a fault, and will make extraordinary self-sacrifices to please others. Those who prefer meat snacks are loyal and true friends who can always be trusted. However, their over trusting nature predisposes them to emotional turmoil, especially when breaking up with a lover.

Note: Those who prefer meat snacks are prone to, and thus should be careful to avoid, rebound relationships!

My students felt that their personalities matched rather well with the descriptions linked to their snack preferences. However, they expressed the opinion that reading these descriptions felt like reading daily horoscopes in the newspaper, where so many different predictions may fit a particular person.

Wanting to learn more about the details of this study, we asked Laurie Snell to provide us with the original research report (knowing that he is always able to fulfill this type of request), and after many efforts, he was able to produce the original report, "Snack food Hedonics and Personality," released February 1, 2000. (Note: hedonics is the field of psychology that measures feelings). Although this report provides more information on the background and rationale for the study, the details of the methods are no more detailed than those given in the original NYT article (i.e., the subjects were volunteers, ages ranged from 17-77, 73% female, the names of the tests are given, and the analysis used was correlation). We still don't know what type of correlational analysis was used, how it was used, nor how strong were the different results. We certainly find the following claim to be questionable based on the results of the study:

"Food hedonics, whether it be snack food, fruit or ice cream, has potential utility as a projective test for psychiatric illness as well as personality typing in subjects without pathology. Future research establishing cross-geographics and cross-cultural validity of this projective testing is warranted."

DISCUSSION QUESTIONS.

(1) What details would you like to know about this study in order to better critique the results?

(2) What types of analysis could be used to produce the results that Hirsch presents?

(3) How might you design a more convincing study that links food preferences to personality type?
                                       <<<========<<



>>>>>==============>
Lucky numbers; A casino chain finds a lucrative niche: the small spenders.
The Wall Street Journal, 4 May 2000, A1
Christina Binkley

With the spread of legalized gambling in the 1990s, new casinos sprang up in many locations. Harrah's was often among the most popular, only to see its business eroded by flashier new establishments. But continuing investment in ever-more-lavish facilities to lure gamblers is an expensive way to compete. Harrah's has turned to statistics to take a careful look at its existing customer base, particularly the "small spenders," with an eye towards enhancing revenue.

As it turned out, there were plenty of data available. The casino had been providing "frequent gambler cards" that customers presented before playing. The customers got information on how to get free meals, rooms or show tickets, while the casino got data on gambling behavior. Originally, the idea was that providing give-aways, at certain levels of gambling, would increase the amount people spent at the casino, but this turned out not to work. Then Harrah's chairman Phil Satre got a tip from Coca-Cola's marketing experts, who suggested hiring a chief operating officer with a background in marketing. Satre offered the job to a Harvard Business School professor named Gary Loveman, who has applied sophisticated market analysis techniques to Harrah's data.

The frequent gambler cards provided data on everything from where players live to how fast they pull a slot machines. This has turned out to be a gold mine for market research. Some of Loveman's analysis uses the logic of clinical trials. In one test, Harrah's divided a sample of frequent slot machine players into two groups. The "controls" were offered a standard marketing package worth $125, including a free room, two steak dinners and $30 in gambling chips. The "treatment" group simply were simply offered $60 in chips. Even though the latter was much less expensive for the casino, it produced more gambling. As a result, Harrah's has now instituted the new promotion scheme and has seen it generate more than twice the profit per person-trip.

Another promotion targeted a group of individuals who lived near casinos and were identified as "avid gamblers" by the speed with which they hit slot machine buttons. The company tried to encourage back-to-back visits by making cash and food offers that expired in consecutive two-week periods. In the target group, the average number of trips per month jumped from 1.1 to 1.4.

DISCUSSION QUESTION:

In the gambling industry, it is widely believed that gamblers are sensitive to slight changes in slot machine odds. Mr. Loveman questions whether this might be just a myth and believes that "there's a lot of money to be made for the person who has an answer to that." How would you design a study to find out?
                                      <<<========<<



>>>>>==============>
Ask Marilyn
Parade Magazine, 28 May, 2000
Marilyn vos Savant

A reader writes:

At work, we had a contest in which the prize was a new car. The six finalists could choose from six keys, only one of which would start the car. In an order chosen at random, each person would select a key and try it. (If the key didn't work, it would be discarded.) The second person picked the key that started the car, so the last four people didn't even get an opportunity to choose a key at all. In this kind of contest, would you want to go first?

Bryan Delaneyh, Sious City, Iowa         

Marilyn proposes also a "trickier" version:

Say that Bryan drops his own key into that original box of six keys, making the number of keys total seven instead. He's going to randomly remove one key at a time and then try to start the car (If the car doesn't start, it would be thrown out.)

Marilyn gives the following answer at the end of her column:

Bryan's chance of retrieving his own key on the second trial is still one out of seven: in the contest there is no advantage to any particular position.

DISCUSSION QUESTIONS:

(1) In Marilyn's version, do you think that one of the other players could win Bryan's car?

(2) If the prize is awarded on the second draw, how would you explain to the person who was due to draw third that she hadn't really 'lost her chance'?

(3) In a previous column Marilyn was asked the following similar question (See Chance News 5.01 and 5.02):

Say 10 tickets are numbered 1 through 10 in a drawing. Half the numbers are even and half are odd. The first ticket is drawn, and it's No. 3 which is odd. That leaves five even number and four odd ones. Doesn't this mean that the next ticket to be drawn is more likely to be even? If I buy a ticket at this point, wouldn't I have a better chance of winning the next draw by choosing an even number?

Joe Ball, Pittstown, NJ            

How would you answer Joe?
                                                         <<<========<<



>>>>>==============>
In detecting liars, actions speak louder than words.
Boston Globe, 11 May 2000, A14
Richard A. Knox

A story from Chance News 8.05 described how difficult it can be to determine when someone is lying. The present article sums up past research as indicating that most people do no better than a coin toss. But recent research published in the journal Nature has found that stroke victims who have lost the ability to comprehend speech--a condition known as receptive aphasia--have a success rate near 75% identifying people who are misrepresenting their true emotional feelings.

The researchers speculate that injury to the speech-comprehension region of the brain's left hemisphere is compensated for by areas of the right hemisphere that detect changes in facial expressions. Such expressions could give away emotions that the speaker might be trying to hide. The new findings are consistent with effects reported by neurologist Oliver Sacks, the author of "The Man Who Mistook his Wife for a Hat." Sacks has described how a group of aphasia patients watching a televised speech by Ronald Reagan were frequently moved to laughter. One of the researchers on the present study, Psychologist Paul Ekman of the University of California at San Francisco, explained that the aphasiacs perceived that "Reagan didn't look like someone who was believing what he was saying."

Here is how the new study was conducted. Ten aphasiac patients who could follow simple instructions were chosen to be tested. They evaluated videotapes of nurses who were watching tapes and allegedly describing what they saw. In all cases, the nurses were instructed to provide pleasant descriptions, even though some of the material they watched depicted unpleasant medical scenes. The subjects were asked to indicate when they thought the nurses were lying. Their performance was compared with three other groups of non-aphasia patients: 10 healthy adults comparable in age and sex to the aphasia patients, 10 patients with right-hemisphere brain damage, and 48 healthy MIT undergraduates.

Overall, the 68 subjects in the three non-aphasia groups did no better than coin-tossing at identifying lies. The aphasia patients had a 60% success rate when relying on both visual and vocal cues, and a 73% success rate when focusing on facial expressions alone. This gives further support to the theory that aphasia patients develop enhanced ability to process visual cues. In fact, it turns out that the one aphasia patient who did no better than a coin toss was also the only one whose stroke had occurred within the last year.

DISCUSSION QUESTIONS:

(1) Can you tell if the 60% success rate is statistically significant? What else would you need to know?

(2) Do you think a 60% success rate is practically significant? In the general population, do you think a person with this skill would come to be known as a good lie detector? How do you think aphasia patients were originally identified as possibly possessing such a skill?
                                   <<<========<<



>>>>>==============>
Why did we eat all that oat bran? 'Cancer prevention' diet had too little evidence, not enough skepticism.
The Boston Globe, 14 May 2000, E1
Thomas J. Moore

In 1975, the National Cancer Institute (NCI) assembled leading specialists to investigate the link between diet and cancer. Some of the early recommendations seemed so sensible and fit so well with prevailing wisdom on diet, that they were widely publicized before randomized clinical trials had been completed. The 'cancer prevention' diet is familiar to anyone who follows health news: lots of fiber, limited fat, and at least five servings of fruits and vegetables a day. However, as the article reports, subsequent studies on such diets have repeatedly failed to find any preventative benefits.

The article does a nice job summarizing the history of these ideas. In 1981, epidemiologists Richard Doll and Richard Peto of Oxford University published an influential study that attempted to rank all causes of cancer. They found that air pollution, toxic chemicals and food contaminants were not important factors. This left diet as a key factor, even though it was not clear which foods were involved. As the article points out, it is difficult to study individual foods in isolation; eating a high-fiber muffin, for example, may displace something else in the diet. Also, people's menus vary from day to day, making it difficult to get precise data (see Chance News 9.03 for a discussion of mis- reporting on food surveys). Doll and Peto expressed the considerable uncertainty in their findings by reporting that between 10% and 70% of cancers were linked in some way to diet. But the public likes to fix attention on a single number rather than a range, and the article speculates that the widely cited figure that "one-third of cancers" are linked to diet may have arisen from the approximate midpoint of Doll and Peto's range.

NCI scientists of course recognized the need for clinical trials to investigate the nature of the proposed links between cancer and diet. Beta carotene and dietary fiber were identified early on as candidates for study. Lung cancer was the leading cancer killer, and some evidence suggested that antioxidants were helpful in its prevention. This focused attention on beta carotene, an antioxidant common in certain foods. The second leading killer was colorectal cancer, and fiber appeared to have a preventative effect there.

Beta carotene was the first factor to be studied in clinical trials. The research has been extensive, with the longest studies ranging from 8 to 12 years. One looked at 22,000 physicians, a low risk group. Another focused on asbestos workers, who are at high risk for cancer. But no large scale study has found any benefit (for example, see "Beta carotene flunks again" in Chance News 9.01), and one study even found a possible harmful effect.

Fiber, fruits and vegetables were the next factors to be systematically investigated. For the case of colorectal cancer, it was possible to conduct more focused experiments by looking at individuals who had already developed the benign polyps that can later become malignancies. However, there remained the difficulty of controlling diet. One study divided subjects into two groups who would eat one of two identical looking breakfast cereals that differed in fiber content. The "low-fiber" cereal had 2 grams per serving while the "high-fiber" cereal had 13 grams. Another study involved more aggressive dietary changes: increasing fruit and vegetable consumption by two-thirds, increasing fiber consumption by three-quarters, as well as reducing fat intake. The subjects remained on the diets for three to four years. Yet again, neither regime produced any significant cancer benefit. In fact, there were slightly more cancers in the high-fiber group.

With all of these negative results, it may seem surprising that the recommendations of the "cancer prevention" diet are still so often repeated. One reason is that, in any case, the diet is not bad for you, and probably has benefit in other health issues like weight control. But another reason, which the article sadly notes, is that advocates of the diet seem unwilling to concede that they made a mistake, even when faced with the scientific evidence.

DISCUSSION QUESTION:

In light of all this, do you think that the National Cancer Institute should make recommendations on diet?


                                  <<<========<<



>>>>>==============>
Doctor lashes out in Prozac battle.
The Boston Globe, 15 May 2000, A1
Richard A. Knox Prozac revisited:

As drug gets remade, concerns about suicides surface.
The Boston Globe, 7 May 2000, A1
Leah R. Garnett

In 1990, Harvard psychiatrist Jonathan Cole and colleagues published a study linking the antidepressant drug Prozac to akathisia, a state of agitation that can lead to suicide. Dr. Cole's work is in the news again because he has come out in support of a federal lawsuit pending against pharmaceutical firm Pfizer. Pfizer manufactures Zoloft, another antidepressant belonging to the class of drugs known as selective seratonin reuptake inhibitors (SSRIs). The suit was brought by the parents of a Missouri teenager who committed suicide in 1997 while being treated with Zoloft. Cole charges that drug companies and the FDA are failing to take the problem seriously.

According to Cole, studies done in Texas in 1993 and 1995 found that about 1 in 200 patients reported "new suicidal thoughts" while taking Prozac; none of the patients taking an older drug unrelated to SSRIs had such thoughts. Cole said that rate "sounds about right to me. [It's] rare enough to make most physicians not notice the effect, but common enough to cause serious adverse effects, such as death by suicide, in a few patients." He added that large scale studies would be needed to precisely estimate the frequency of suicides associated with SSRIs. Suicide is rare, even among people suffering depression, and researchers would need to further distinguish suicidal episodes related to depression itself from those related to the drug.

The earlier Globe article reports on old findings that have recently come to light as the patent on Prozac is due to expire, and manufacturer Eli Lilly seeks to introduce a new formulation of the drug. Internal company data indicate that, during early clinical trials for the original Prozac, 1 in 100 previously non- suicidal patients experienced akathisia which led them to attempt or commit suicide. In 1987, Germany's equivalent of our FDA initially denied approval of the drug, based on Lilly's own studies showing that previously non-suicidal patients had a five times higher risk of suicides and suicides attempts compared with patients on older drugs, and three times higher than those on placebo. The drug was eventually approved, but it is required to carry labeling that warns of suicide risk, and doctors are advised to consider prescribing sedatives along with the drug. In the US there is no such warning.

Dr. David Healy, brain researcher at the University of Wales, estimates that worldwide "probably 50,000 people have committed suicide on Prozac since its launch, over and above the number who would have done so if left untreated. (Dr. Healy has been called as an expert witness in the lawsuit discussed earlier.) More than 35 million people worldwide were taking Prozac as of 1999; sales of the drug represent 25% of Lilly's revenues.

DISCUSSION QUESTIONS:

(1) Would you agree with Dr. Cole's assessment that a 1 in 200 effect would escape the notice of most physicians? How common do you think an effect would need to be before it was readily picked up?

(2) What kind of study could ethically be done to determine whether use of SSRIs can lead to suicide?

(3) How do you think Dr. Healy arrived at his estimate for the number of suicides attributable to Prozac?
                                      <<<========<<



>>>>>==============>
  Dan Rockmore also suggested the following related articles. They all report research that uses a new form of DNA fingerprinting to identify populations with a common ancestry.

If biology is ancestry, are these people related?
New York Times 9 April, 2000, section 4 p. 4
Nicholas Wade

Surnames and the Y chromosome.
American Journal of Humnan Genetics, April 2000, 1417-1419
Bryan Sykes and Catherine Irven

DNA as detective again, but on a biblical case.
The New York Times, 21 Feb. 2000, E7
Walter Goodman

Lost tribes of Israel.
Nova WGBH Boston, 22 February, Videotape $19.98 (800-645-4PBS)
Lost tribes site: http://www.pbs.org/wgbh/nova/israel/

Origins of Old Testament priests.
Nature 9 July 1998
Thomas, Slorecki, Ben-Ami, Parfitt, Bradman, Goldstein

Y chromosomes traveling south: The Cohen Modal Haplotype and the origins of the Lemba--the "black jews" of Southern Africa.
American Journal of Human Genetics 66:674-686,2000
Thomas, Parfitt, Weiss, Skorecki, Wilson, le Roux, Bradman, Goldstein

In his New York Times article Nicholas Wade discusses the use of a new kind of DNA fingerprinting to determine the origin of a surname. Walter Goodman in his Times article and the NOVA program describes how this same method has been used to support the claims of the Lemba people of South Africa that they have a Jewish heredity.

This new DNA fingerprinting uses the fact that, unlike other chromosomes, most of the DNA on the Y chromosome is passed on without change from father to son. Mutations can cause small changes in the DNA called polymorphisms or markers.

There are four types of polymorphisms:

(1) A sequence of letters at a specific location, for example GATG, which is repeated a small number of times, typically 2 to 7, is passed on to a son with the number of repetitions changes. For example if it is repeated 4 times, in the next generation it might be repeated 3 or 5 times. (it is believed that the number will change by only 1 in a single generation). Such a mutation is called a microsatellite.

(2) Same as a microsatellite but the number of repetitions is large, typically 10 to 60. This is called a minisatellite.

(3) At a specific location on the DNA, a string of letters is inserted or deleted.

(4) At a specific position on the DNA a letter is changed, say from a G to a T.

Changes of type (1) and (2) can occur several times in the evolution of a chromosome with those of type (1) occurring more often than those of type (2). Changes of type (3) and (4) are expected to occur at most once in the evolution of the chromosome and are called "unique event polymorphisms" (UEP's).

This classification is presented in an interesting article on this kind of DNA fingerprinting:

Why Y? The Y chromosome in the study of human evolution, migration and prehistory.
Science Spetra (http://www.gbhap.com/Science_Spectra)
Neil Bradman and Mark Thomas

On the Lost Tribes web site, you are given a family tree with the names and DNA of four male members located on the tree. You are then given the DNA for six other relatives and asked to place them on the tree in a way consistent with the changes in their DNA. This simplified model gives a good feeling for how geneticists use changes in the DNA to learn about the history of the male branch of a family.

In his New York Times article, Wade describes how Bryan Sykes, a geneticist at Oxford, started a new company to use polymorphisms to help people trace the origins of their surnames.

Surnames came into use in England between 1250 and 1450. The names were often associated with trades so, for example, one would expect a number of different original Smiths. Bryan started by considering his own surname, Sykes. He wanted to see if his name could be traced back to a single family or if several unrelated males chose Sykes as their surname.

Bryan found that most of the Sykeses came from three counties, matching an area of the earliest known occurrences of Sykes in the 13th and 14th century. He sent letters to 269 men randomly chosen from these counties, asking them to send him cells brushed from their inside cheek on a cotton swab sample. Bryan received a total of 61 responses and ended up with DNA which could be analyzed from 48 of these. He included in his study two control groups: A sample of 139 native English males chosen at random from all over the country, and a group of 21 male neighbors recruited by the Sykes volunteers.

Sykes chose four standard microsatellites and determined the number of repetitions for each of his samples. The particular set of values of such markers are called haplotypes. Here is his data with the numbers in parentheses indicating percentages of the sample that had a particular haplotype. The numbers in perentheses are percentages.

Haplotype
Sykes
Neighbors
English
13-22-10-12
2 (4.1)
0
1 (.7)
13-22-11-12
0
0
3 (2.2)
14-22-10-12
0
2 (9.5)
0
14-22-10-13
0
2 (9.5)
9 (6.5)
14-23-10-13
1 (2.1)
0
9 (6.5)
14-23-10-14
0
1 (4.8)
1 (.7)
14-23-11-13
1 (2.1)
2 (9.5)
26 (18.7)
14-23-11-14
1 (2.1)
0
3 (2.2)
14-24-10-13
0
3 (13.2)
22 (15.8)
14-24-11-13
5 (10.4)
1 (4.8)
36 (25.9)
14-24-11-14
0
0
6 (4.3)
14-25-10-13
0
1 (4.8)
5 (3.6)
14-24-11-13
3 (6.3)
1 (4.8)
4 (2.9)
15-21-11-12
0
1 (4.8)
0
15-22-10-12
0
1 (4.8)
0
15-23- 9-13
0
1 (4.8)
0
15-23-10-13
3 (6.3)
1 (4.8)
3 (2.2)
15-23-10-14
3 (6.3)
1 (4.8)
4 (2.9)
15-23-11-14
21 (43.8)
0
0
15-24-10-12
1 (2.1)
0
4 (2.9)
15-24-10-13
0
1 (4.8)
0
15-25-11-13
1 (2.1)
0
3 (2.2)
15 25 11-14
3 (6.3)
1 (4.8)
0
15-26-11-13
3 (6.3)
0
0
17-23-10-13
0
1 (4.8)
0
Total
48
21
139

The fact that the haplotype 15-23-11-14 occurs in 43.8% of the Sykeses and there was no other haplotype common to a significant number of the Sykeses, suggests that there was only one original Sykes and who had the "Sykes Haplotype" 15-23-11-14. Note that no-one in either of the control populations had this haplotype. Had two unrelated men taken on the name Sykes as surname, it would be expected that they would have had different haplotypes and that these would have shown up in a significant number of the current Sykeses in the sample.

What about the other 27 current Sykeses who had different haplotypes? Two haplotypes are only one mutation away from Sykes haplotype and so these differences might have been caused by a mutation. However, the other 25 haplotypes would require more mutations than could reasonably be expected in a 700-year period. Thus Sykes suggests that these are due to what geneticists delicately call "non-paternal" events (such as an unfaithful wife).

If all the 27 who did not have the Sykes haplotype were such events, this would correspond to a rate of 1.3% per generation as compared to a rate of between 2 and 5 percent conjectured for the frequency of non-paternity events in the present English population. Wade comments that the court records suggest that the Sykes, at least in the 14th century, were a rough lot and quotes Bryan as saying, "Nonetheless their wives were faithful through all of this."

Sykes points out that, if this procedure is found to be successful for other names and markers, this would have applications to genealogy and forensics.

The same kind of DNA fingerprinting was also used recently to provide evidence that Thomas Jefferson fathered a family with his slave Sally Hemings.

We turn now to another application of this technique.

The Lemba people are a black Southern-African population who have long claimed to be of Jewish ancestory. They have based this claim on their oral history and the fact that they have a long tradition of Jewish customs such as circumcision, ritualistic slaughtering of animals and their refusal to eat meat from animals that are even remotely related to pigs.

From the script of the NOVA program we read:

SHAYE COHEN: According to the Bible, God selected the tribe of Levi to be the priestly tribe. And this tribe in turn was split into two. One line from Aaron provided the priests, another line through Moses provided the Levites who would serve as assistants in the central sanctuary. The crucial point is that the priestly line is tribal in the sense that it is transmitted from father to son. I, for example am a priest. My last name is Cohen, or Cohane in Hebrew, and I am a Cohane because my father said I was... You don't have to pass qualifications to be in the priestly line. The only thing that matters is your heredity.

Another Cohane, Karl Skorecki, comments that, while sitting in a synagogue

Another member of the congregation was called upon as a Cohene, or Jewish priest, and his origin was from North Africa. And my origin in terms of where my parents came from is from Eastern Europe and Poland. So, I thought to myself at the time, well what might we have in common other than this tradition that we have? So this led to the notion that perhaps we could find somewhere in the human genome, a similarity.

This in turn led to the Nature article "Origins of Old Testament priests."

In this study the authors took a sample of 306 male Jews from Israel, Canada, and the United Kingdom. They broke this up into six groups: First whether they were identified as Cohens, Levites, or Israelite (not Cohens or Levites). Then, to compare groups widely seperated geographically, each group was further classified according to whether the origin of the family was Ashkenazic (Jewish communities of northern Europe) or Sephardic (Jewish communities of North Africa and the Middle East).

As markers, the researchers used six microsatellites (repeats of short letter sequences) and six 'unique-event' polymorhisms (UEPs). They found 112 hapolypes but, despite considerable diversity among the Isralites, they found a single haplotype (repetitions 14,16,23,10,11,12 in their six microsatellites) occurring strikingly frequently in both the Ashkenazic Cohens (50%) and the Sephardic Cohens (56%). They called this haplotype the "Cohen Modal haplotype." It is assumed then that all the Cohens came from a single Cohen with this haplotype. The frequencies in the Israelites were 13% for the Sephardic group and 10% for the Ashkenazic group and later research showed that it is rarely seen in non-Jewish populations.

Assuming that the other 50% of the haplotypes for the Cohens resulted from mutations, and that these mutations occur as a Poisson process, the researchers were able to estimate how far back they would have to go to get the variation that they had in their sample. Their estimate was 106 generations which, for a generation time of 30 years, would give an estimate of 3,180 years before the time the sample was taken. Of course, as you can imagine, the confidence interval for their estimate is large. However, this estimate is in good agreement with the Biblical account that Moses assigned the status of priests to the male descendants of his brother Aaron after the Exodus from Egypt, believed to occurred some 3,000 years ago.

Finally, what has all this got to do with the original problem of the claims of the Lembas? In the American Journal of Human Genetics article, researchers collected data from the Lemba populations and other populations related to the possible Lemba ancestry as well as from the general Jewish population. The researchers used the same markers that were used in the study reported in the Nature article. They found that about 10% of the Lemba population had the Cohen modal haplotype consistent with their findings for the general Jewish population. The Lemba population consists of a number of clans. Of these clans, the Buda clan is the oldest and, for some ritual purposes, the most important. In this clan, more than 50% had the Cohen Modal Haplotype. All of the results are consistent with the Lembas' oral history.

The story of the Lembas and the attempts to verify their oral history both by DNA analysis and by the personal explorations of one of the authors of these studies, Tudor Parfitt, is told and beautifully illustrated in the NOVA video.    
                                          <<<========<<



>>>>>==============>
Here is an account by Dan Rockmore of another chance exploration carried out recently by Dan and Laurie. This is a sequel to a previous exploration when they visiting local weather forecaster Mark Breen (See Chance News 8.04).

Whither the weather balloon?

Recently, our quest to understand the process of weather prediction led us to take a trip to visit the National Weather Service (NWS) Forecast Office in Gray, Maine. Our goal was to witness statistics in action, in the form of the launch of a real, live weather balloon.

Some time ago, our friend Mark Breen, the local Vermont Public Radio forecaster had shown us how some of his forecasting depended on the regional forecast generated from this NWS office. In turn, their own forecast used the data generated by weather balloons which are launched every day, at seven o'clock in the morning and seven o'clock in the evening at about 70 sites around the country. We couldn't understand how we had never seen one of these balloons. This was clearly a job for the Chance News action team, so off we went to witness a launch, ask some questions and shed some light on the mystery of the missing balloons.

We arrive in nearby Portland, Maine on the evening before our scheduled morning meeting at NWS, just in time to take in a Portland SeaDogs baseball game and have a great dinner at the Fore Street restaurant (order the Loup de Mer if it is on the menu!). We awake at 5:30 AM the next day and are on the road by 6 heading for the weather station. Weather prediction is clearly a coffee-intensive activity.

We wind our way over the local highways on a drizzly and appropriately gray morning in Gray. Our final destination is a medium-sized reddish brick, official-looking building, which is the regional weather prediction center of National Oceanic and Atmospheric Administration (NOAA). You can see pictures of the office at

http://www.seis.com/~nws/officepix.html.

For a brief history of the office see

http://www.seis.com/~nws/historyPWM.html.

You'll be interested to discover that the National Weather Service traces its origins to President Ulysses S. Grant, who gave responsibility for its creation to the Secretary of War.

We ring the bell and are let into the center. Weather prediction is a 24 by 7 job and some night-shift scientists are around, as well as the scientist in charge of the sacred balloon launch, Art Lester, one the hydro-meteorological technicians. In fact Art has already been there for about an hour or so, since it is his responsibility to prepare the balloon at around 6 AM for its 7 AM launch. Art gives us a tour of the operations room which is basically a computer room where computers are running weather models and weather maps are displayed on every monitor. We ask a bunch of questions about the pictures. The scientists can't help but look and sound like weather forecasters as they respond. Their hands sweep across the monitors as they trace out fronts moving this way and that, lows evolving into highs, and temperature isotherms. Other maps show clouds of precipitation and locate lightning strikes. The weather prediction models are being run in preparation for the construction of the morning regional forecast which is put together by our host John Jensenius, the Warning Coordination Meteorologist, who arrives at about 6:40.

John takes us aside to show us a weather balloon up close, along with the instrument that it carries into the sky, a radiosonde. The balloon itself is a silky yellowish brown sac which will be filled with helium and released into the air and should rise about 17 miles. The accompanying radiosonde carries sensing instruments encased in styrofoam. The instruments are measuring the temperature, humidity, barometric pressure at different altitudes. Wind speed and direction is also inferred by tracking the signal, thereby monitoring the movement of the balloon and radiosonde. The readings are sent back via radio waves - separate frequencies are reserved for the different variables. In fact as a way of checking that the radiosonde is on-line, the measurements are "played" as a set of tones of different frequencies. This gives new meaning to songs like "You are My Sunshine'' and "Thunder and Lightning".

Launch time is approaching. Everything checks out at the office and now it's time to launch! We go out and get into the car for a quick drive up to the launch site -- it looks like a little observatory, really just like a largish tall garage. We go inside and there is the balloon, tied to the table, with radiosonde tied on to it. The garage door is opened and final preparations are made, checking again to see that the radiosonde is secured and transmitting. The balloon is walked outside and with no ceremony at all, released and it speeds into the sky. With the low ceiling on this cloudy day it is quickly out of sight. Its rate of ascent makes it clear to us why we have never seen one floating lazily in the sky. With the low ceiling, in ten seconds it disappears, and we imagine that even on a clear day it is invisible in less than 30 seconds. As it rushes up to the clouds, the highly malleable skin flattens out in the early morning wind, looking more like a large lumpy pillow (UFO?) than a balloon, and soon it is gone, disappearing into the low clouds on this rainy morning. The radiosonde continues to transmit its song of the weather.

The readings will continue to be sent for about 2 hours or so. After this the balloon usually bursts (this was tested on the ground) and begins to descend rapidly to earth, often ending up in the ocean or a tree somewhere, which is probably why we have yet to come across one in our daily wanderings. In general, those launched on the east coast are rarely found, although in other places, like the landlocked Midwest with its wide-open plains, they are found relatively often. The radiosonde has a little notice on it, assuring anyone who finds it, that the equipment is perfectly safe and asking that it be mailed back (for free) to the regional office whose address is on the label. Each launch costs about 200 dollars, including the money spent for the time of the scientists. It is believed that this process will be automated in the near future, and that ultimately (possibly as soon as ten years from now) as satellite imaging gets better, there may very well come a day when the balloons are unnecessary.

We get back into the car to return to the main building for our debriefing. We continue our discussion with John of the role that the forecast plays, and the many, many tools available now for predicting the weather. John will be using the radiosonde data as well as the most recent model output, and satellite data (animated as brief movies made by the looping the pictures taken over time by the satellites) to prepare the day's area forecast discussion which is posted on the web - (see

http://www.seis.com/~nws/mesnhs.html

for the appropriate link as well as other related links). Comparison of the model predictions with incoming weather data helps to generate the forecast. We remember that the regional forecast is used by our own Mark Breen on VPR to help create his local forecast.

John explains that there have been great improvements in the 3,4, and 5-day forecasts, but that beyond that accurate predictions remain elusive. Once again we discuss the age-old bugabear, "What does probability of precipitation mean?" as well as its corollary "If a 20 percent chance of rain is predicted, and it rains, is the forecaster correct?" Here is our e-exchange:

Dan: Given some initial conditions, the model either outputs precipitation, or no precipitation - the algorithm then goes back and checks to see historically what percent of time these initial conditions actually did produce precipitation. Is that right?

John: Yes. The machine-generated probabilities are based on equations developed with forward stepwise multiple linear regression based on a historical sample of past observed data (the predictand) and past model forecasts (the predictors). The regression can select from a considerable number of predictors but usually equations are limited to between 10 and 12 predictors. For the Probability of Precipitation(PoP), the most important predictors are generally the model-forecast relative humidity, model-forecast precipitation occurrence, model- forecast precipitation amount, and model-forecast vertical motion. Some of these are usually transformed into binary and grid-binary formats in addition to being offered in the raw formats. In addition, the model forecasts are space-smoothed to help incorporated some of the uncertainty inherent in the computer-generated forecasts.

Dan: If the weatherman predicts a 20% chance of rain, and it rains, is he right?

John: Only two forecasts can be absolutely wrong. A forecast of a 0% probability of rain when it rains and a forecast of 100% probability when it doesn't rain. All other probabilities must be evaluated in terms of the forecast reliability. To evaluate the reliability, you must gather a sufficient sample of cases of each probability (for example 20%). Then you must determine the observed relative frequency of precipitation for those specific cases. If the observed relative frequency closely matches the forecast probability, then the forecast is reliable. By looking at the observed relative frequencies for each of the forecast probabilities, you can see whether the forecasts are reliable over the entire forecast range of probabilities. One other factor to consider is the ability of a forecast system to discriminate between the rain/no rain cases. To simply always forecast the climatic observed relative frequency of precipitation will give you reliable forecasts given a sufficiently large sample, but will do nothing to discriminate between the rain/no rain cases. Both reliability and discrimination factor into the Brier Score.

Finally it's time to leave. We've tracked down the elusive weather balloon, yet unbelievably the joys of discovery are not yet over. As we make our way back to the interstate, we decide to stop for a proper breakfast at a roadside diner, Stones Grove. The six pick-up trucks parked in front are a good omen. We are not disappointed as we enjoy the best home fries we've ever tasted. The diner is full of regulars. We join in as the gang all sings Happy Birthday to Tom the cook who receives an 8 pound torque wrench, sweater, pineapple upside down cake and gift certificate to L.L. Bean for his birthday. As we climb back in the car and head home, we congratulate each other on a perfect trip. Now what were the chances of that?!

DISCUSSION QUESTION:

Read how the Brier Score is defined in an article by Harold Brooks at:

http://www.nssl.noaa.gov/~brooks/media/okcmed.html

Read how Peter Doyle would measure the validity of a weather forcaster at:

http://math.dartmouth.edu/~doyle/docs/email/forecast.txt

What is the difference in these two approches? Which do you think is better? Compare these two methods as applied to the data given in Harold Brooks article.
                                         <<<========<<



>>>>>==============>
Chance News Copyright (c) 2000 Laurie Snell This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 9.06

May 7, 2000 to June 6, 2000

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!