Please send comments and suggestions for articles to email@example.com.
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site: http://www.dartmouth.edu/~chance
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newletter for details.
Spring is here and sports are in the air and, as so is the law of averages. ===========================================================
The law of averages says that what happened last year can't happen again but we still won't take any chances.
Soccer coach Nick Theodorakpoulos
The law of averages says the Pats will get no less than one useful regular from yesterday's first round.
Sportwriter Michael Gee
I always call heads and, in about 45 matches as captain, I reckon it has come up heads on only about eight occasions.... I do not want to change as the law of averages is now firmly on my side.
===========================================================Brian Lara (Cricket player)
Contents of Chance News 7.04
Note: Keven Jones sent us an nice example of a misleading graph related to an issue in the news (See item 15). We get many of our articles from Lexis-Nexis which does not have graphs of advertisements so we probably miss a lot of interesting examples these. We would appreciate readers calling our attention to graphs or advertisements that we might want to mention in Chance News. Of course, we continue to be interested in hearing about any interesting chance news articles you find.
The practice of therapeutic touch is used in hospitals all over the world and is taught in medical and nursing schools. In this therapy, trained practitioners manipulate something that they call the 'human energy field'. This manipulation is carried out without actually touching the patient's body. Practitioners of this therapy claim that anyone can be trained to feel this energy field.
Some researchers say that there exists no reliable evidence showing that this technique heals patients. Dr. Donal O'Mathuna, a professor of bioethics and chemistry at the Mount Carmel School of Nursing in Columbus, Ohio, has reviewed more than 100 papers and doctoral dissertations on this technique without finding any convincing data.
James Randi, a magician who is a well-known skeptic of some types of alternative medicine, has been trying to test the practice of therapeutic touch for years. Only one person has agreed to submit to his test, but when she was tested she did no better than chance in detecting the energy field.
Emily Rosa, an 11-year-old in Colorado, was able to recruit 21 practitioners of therapeutic touch in an experiment she conducted two years ago. Emily's mother, who is a nurse and a skeptic of this technique, believes that Emily was able to recruit this many subjects because they did not feel threatened by a 9-year-old girl working on a project for a science fair.
Her test consisted of placing a screen between a subject's eyes and hands, and then holding her own hand over one hand or the other of the subject. The premise of this experiment is that if, in fact, the subject can feel Emily's energy field, then the subject should be able to determine over which hand Emily's hand is being held. Emily conducted 280 tests with 21 subjects, and they identified the correct location of her hand in 44% of the tests.
The results of her study were reported this month in the Journal of the American Medical Association. Reaction was swift from proponents of the therapeutic touch technique. Meanwhile, Emily recently received a letter from the Guiness Book of World Records, saying she may be the youngest person ever to publish a paper in a major scientific journal.
(1) One practitioner of therapeutic touch, in response to Emily's results, stated that people who use this technique rely on more than just touch to sense the energy field. They also use 'the sense of intuition and even a sense of sight'. Other users of this method claim that patients who are ill have hot or cold spots in their energy fields in some cases, or have areas that feel tingly. Can you design an experiment that allows the practitioner to use senses of touch, sight, and intuition, and that still tests whether the technique is a valid one?
(2) How likely is so poor a showing as 123 or fewer correct
responses out of 280 tests, given no real ability to detect the
The Dallas Morning News has a science page on Monday. Here is an article from last Monday's edition.
In matters of mathematics, albatrosses aren't birdbrains.
The Dallas Morning News, 6 April, 1998, 7D
Suppose that an animal is foraging for food, and the food is unevenly distributed in some planar domain. How should the animal search for the food? Do animals in the real world behave in a mathematically optimal manner? These questions have recently been looked at by Boston University physicist Gandhimohan Viswanathan and some of his colleagues.
To create a mathematical model of the situation, Viswanathan assumes that the target sites are Poisson distributed. The animal begins by choosing a direction at random and a distance from a certain probability distribution. If, during the trip of the given distance, it sees any food, then it stops and eats. At the end of the flight, or after it has eaten, it chooses another direction at random and another distance, and begins another trip. The model also assumes that the foraging is 'non-destructive,' i.e., the animal can visit the same site many times.
A convenient set of distributions to use for the length distributions is the set of Levy distributions, with density function C d^(-u), where C and u are constants and d is the length of the flight. In the model, one defines the foraging efficiency to be the ratio of the number of target sites visited to the total distance traversed by the forager.
When one maximizes the foraging efficiency with respect to u, one obtains a value of u that is very close to 2. One way to understand this result is to note that, if u is large, then the animal usually stays close to its starting point, thereby missing out on many target sites within range that contain food. On the other hand, if u is small, the animal does not take advantage of the fact that the foraging is non-destructive.
Various sets of data have been obtained to test this model. A study involving bees in an area of low nectar concentration gives an observed value of u = 2, while if there is high nectar concentration, the observed value of u is higher (u = 3.5). Studies involving deer have given rise to values for u of 2.0 and 2.1.
Viswanathan and his colleagues were able to obtain data for albatrosses in the South Pacific by attaching small radio transmitters to the legs of the birds. In this case, they obtained an observed value of u = 2.0. It is interesting to note that although it has long been known that albatrosses will travel great distances while foraging, it has not been possible to track them since they move too fast for ships and too slowly for planes. Evidence now exists that shows that albatrosses sometimes fly from Hawaii to San Francisco in search of food.
Suppose that a small child is lost in the woods. Do you think
that the above model (with u = 2) gives an optimal (or near-
optimal) algorithm for searching the woods for the child?
Initials that spell success.
Daily Mail (London) 31 March 1998, p. 31
What's in a name? Your initials. And a recent study shows that people who have initials that spell 'good' words live longer than those whose initials spell 'bad' words. Researchers at the University of California at San Diego sorted through 27 years of death certificates, amounting to roughly 5 million men.
Those with initials deemed 'good', such as WIN, VIP, and WOW lived 4.48 years longer, on average, than the control group with neutral initials, while those with 'bad' initials such as ILL, ASS, and DUD died 2.8 years earlier than the control group. The article states also that those with inspirational initials lived up to seven years longer and were less likely to commit suicide or die in accidents than those whose initials were unflattering.
(1) What are some of the design problems that you would want to check in a study like this? (We contacted the author Psychologist Nicholas Christenfeld At UC San Diego, and he said that he has not yet written up the study but would have it soon.)
(2) This is obviously an eye-catching study. Why do you think the major newspapers passed it up?
(3) Can you explain why unpleasant initials should result in more
suicides? Do you think that people with unpleasant names (for
example, Snell, who was often called Smell or Stinky) are also in
A number of new articles on the Bible Code controversy can be accessed from Brendan McKay's Torah Codes page: Here are four that we found interesting.
A lecture by Eliyahu Rips. On the Witztum-Rips-Rosenberg sample of nations. Dror Bar-Natan, Brendan McKay, and Shlomo Sternberg Torah codes: reality or illusion A.M. Hasofer The case against the codes. Barry Simon
The lecture by Rips was given in about 1985 in Russian and is provided with an English translation by McKay to help clarify the history of the Bible Code controversy.
In the Statistical Science article (See Chance News 4.15), the authors, Witztum, Rips and Rosenberg, gave no explanation for how the codes got in the Bible though the reader is surely meant to infer that they came from God. In this lecture, given some 10 years before the Statistical Science article appeared, Rips explains what he had already done and hoped to do to establish the presence of the codes in the Torah and what this will mean. We found his discussion of the effect on free will particularly interesting:
Everything is foreseen but man is still free to exercise his will. How is that possible? Let me put it this way: supposing that today we are watching the repeat of a game that took place last week. Now, you know by now that the game ended with the score 3:2, that the first goal was scored fifteen minutes into the game, and so on. Did our present knowledge hamper the players? No, of course not. Yet to the Almighty, who is outside of time, there is no gap between past and future, that is to say, He is as cognizant of the future as He is of the past. But his knowledge lies outside of our world, and it does not prevent us, those living in this world, from exercising our individual free will. That is a crucial point.In "On the Witztum-Rips-Rosenberg sample of nations" McKay and his colleagues analyze a sequel to the Statistical Science paper, by the same authors, which has been circulating in preprint form for several years. Instead of matching Rabbis and their dates, this article considers pairs of the form (N,X), where N is the name of a nation and X is some related word or phrase. It is much easier for this study than it was in the Rabbi study to show that many subjective choices were made which, if chosen otherwise, would not have led to a significant result.
Hasofer is an Emeritus Professor of Statistics at the University of New South Wales and has analyzed previous statistical claims for hidden codes in the Torah. In this article, Hasofer supports the criticisms of KcKay and others of the WRR studies and adds some of his own criticisms. Hasover states that this study was a test of hypothesis where the authors provided a null hypothesis but no alternative hypothesis. He points out that the obvious alternative hypothesis -- the codes were put their by God -- is clearly not testable. He states that when an alternative hypothesis that can be tested is not given, all that can be concluded from rejection of the null hypothesis is that the null hypothesis is unlikely to account for the results. (There seems to be general agreement, both among the proponents and the critics, that the results of the Bible Codes studies did not occur by chance!)
Barry Simon's article is a revision of his previous article ìA Skeptical Look at the Torah Codes" which was published in March 1998 Jewish Action. Simon states that, after considerable study of the evidence and the replies of the proponents to his original article, he has gone from being skeptical about the arguments presented for the validity of the codes to being certain that all of the evidence presented so far has no legitimacy. This is an excellent up-to-date discussion of the case against the Bible Codes. Simon provides a detailed discussion of why a study whose design involves subjective choices that cannot be replicated by others cannot be considered a scientific study.
For a lighter but no less interesting discussion of what science is all about, we recommend the new Richard Feynman book "The Meaning of it All: Thoughts of a Citizen-Scientist." (Addison Wesley 1998, Hard cover, 133 pp, $15.40 from Amazon.) This book consists of three lectures Feynman gave at the University of Washington in Seattle in April 1963. In the first lecture Feynman talks about the nature of science with particular emphasis on doubt and uncertainty. In the second he talks about the impact of science on politics and religion. By the third lecture Feynman says that he has run out of organized ideas and so we are treated to some of his "unorganized ideas" on "this unscientific age".
(1) Hasover remarks that if there is a null hypothesis and an alternative hypothesis, a small probability for the data under the null hypothesis might not lead to accepting the alternative hypothesis because the probability of the data under the alternative hypothesis might be even smaller. Is this relevant for the Bible Codes study?
(2) A coin is tossed 10,000 times and comes up heads 5,150 times. Could you reject, at the 95% confidence limit, the null hypothesis that the coin is a fair coin? If the alternative hypothesis is that the coin is biased with a 53% chance of heads, which hypothesis would you accept? It is often suggested that you will be able to reject any null hypothesis with a sufficiently large sample. Why? Do you think this is what Hasover is really worried about?
(3) In his Jewish Action article, Simon reminds us that a
scientific hypothesis must, at least is principle, be capable of
being disproved. However, it is hard to see how the hypothesis
"There are Bible Codes in the Torah" can be disproved. If it
cannot, this hypothesis is not a scientific hypothesis. What do
you think about this? Does this tell us anything about the
statistical study of Bible Codes?
It takes a hot goalie to raise the Stanley Cup.
Chance Magazine, Vol. 11, No 1, pp. 3-7
Donald G. Morrison and David C. Schmittlein
A number of playoff series between teams A and B are of the type: the winner is the team that wins 4-out-of-7 games. This occurs in baseball in the World Series, in hockey in the NHL playoffs for the Stanley Cup and in basketball in the NBA playoffs. All three of these will be discussed by Hal Stern in the next issue of Chance Magazine in his column "A statistician reads the sports page".
One way to model these series is by Bernoulli trials with team A winning each game with probability p. The probability that team A wins or that the series is over in x games is a routine problem in an elementary probability course and a good illustration of the use of tree diagrams (See Finite Mathematics on the web pp. 29,157). The probability that team A eventually wins at a particular stage in the series was the subject of the famous correspondence between Pascal and Fermat considered to be the beginning of probability theory. (For their solutions, see Chapter "Introduction to Probability" by Grinstead and Snell, Chapter 1 available from the Chance website under teaching aids.) Of course, there are many factors, such as a home team advantage, that call into question the Bernoulli trials model.
This article investigates how well the Bernoulli trials model fits the data for the results of 59 best-out-of-7 National Hockey League playoffs. Looking at the data, the authors see that there are clearly too many 4 game series to be consistent with p = 1/2. They then use the maximum likelihood method to arrive at the Bernoulli trials model with p = .73 for the better team. In terms of the p = 1/2 and p = .73 Bernoulli models, the data gives:
|Series duration||Observed||Expected p = 1/2||Expected p = .73|
A chi-squared test strongly rejects the p = 1/2 model but does not reject the p = .73 model.
One explanation for the .73 model would be the "dominant team hypothesis" meaning that one team is just a lot better than the other. The authors observe that performances in the regular season do not support this big a difference between the teams. However, the authors suggest another theory consistent with the .73 Bernoulli model. This is the "hot goalie hypothesis". Teams typically use two or more goalies during the regular season and the coaches often use in the playoffs the goalie whom they think is "hot". Perhaps a "hot goalie" makes the difference. The authors then consider the "dominant team hypothesis", the "hot goalie hypothesis" and the "home team advantage".
The games are played in the order AABBABA where A is the team with the better record during the regular 82-game season. This gives team A a "home team advantage" since they can play at home 4 times while B can play at home at most 3 times. However, A is also, by definition, the stronger team in the regular season. The A team did win 78% of the series, reasonably close to the 73% predicted in the Bernoulli model. However, the authors decide that the "home team advantage" in the player series is not the reason for this. They conclude, in fact, that the "hot goalie hypothesis" and the "dominant team hypothesis" together (but neither alone) can justify the .73 Bernoulli model.
(1) Our count of the World Series statistics gives:
|Lenght of series||Observed|
What differences in the data do you see between the baseball example and the hockey example? Do you think you could reasonably fit the baseball data using a Bernoulli trials model? If so, what would be your choice of p be?
(2) What objections might be raised against the Bernoulli trials
model for the playoff series in hockey, baseball, and basketball?
Researchers find the first drug known to prevent breast cancer.
The New York Times, 7 April 1998, A1
Lawrence K. Altman
This article reports a breakthrough in the prevention of breast cancer: the group of women, who took the drug tamoxifen, in a large study, had 45% fewer cases of breast cancer than a group of women who took a placebo or dummy pill. Because of the increased incidence of breast cancer with age, women 60 years and older were allowed to participate in the study based on age alone. Women of ages 35 to 60 were included judging by their previous history. A total of 13,388 women participated in the trial for 5 years.
Results: Out of the women who took tamoxifen, 85 developed breast cancer compared to 154 among the women who took the placebo. 8 participants died from breast cancer. Of these 3 were given tamoxifen and 5 placebo.
The drug has dangerous side-effects, such as risk of uterine cancer and blood-clots. A later article ("Breast Cancer Drug Dilemmaî, The New York Times, April 14, 1998,F1, Denise Grady) stated that two women taking tamoxifen in the study died of blood clots. The study caused debate right from its inception because of the dangerous side-effects. The findings of the study suggested that women older than 50 benefited most from tamoxifen but also faced the highest risk of side effects. Here is a comparison of the side effects for the two groups:
|Tamoxiphen group||Placebo group|
|blood clots in lung||17||6|
|blood clots in major veins||30||19|
Federal Officials stopped the study when they found that the drug's benefits clearly exceeded what researchers had originally expected. The researchers felt ethically bound to offer tamoxifen to all women in the study. The study was cut short before they accumulated sufficient data to develop guidelines for the usage of tamoxifen. The article by Denise Grady pointed out that a number of questions remain unanswered: At what age should women start taking tamoxifen? Is a 5-year period of drug use sufficient for prevention? Would the risk of side-effects be reduced with a shorter period? Would there be additional risks if tamoxifen was taken for a longer period? or will it be more effective in the long run?
A letter to the editor from Jon Engerbretson in the New York Times on April 13 points out that there have been too few deaths among women in the study to make a meaningful comparison between the groups, so women and their doctors are left to guess whether the reduced risk of breast cancer is outweighed by other potentially fatal complications. Engerbretson draws a parallel to the short- and long-term studies of the drug AZT. The short-term studies in America of AZT were stopped when it was shown that AZT delayed the onset of AIDS-related infections before it could be determined whether AZT actually prolonged survival. British scientists who conducted the long-term study, despite criticism on ethical grounds, found that AZT had no impact on long-term survival.
Nigel Hawkes (Americans did the right thing, The Times, 8 April, 1998, Home news) argues that the Americans did the right thing morally since volunteers have the right to know if a better treatment can be found. In return for volunteering, volunteers have the right to know the results immediately. But the American study was stopped before it provided a clear picture of the risks and benefits associated with tamoxifen. What is the correct moment to stop the trial?
Researchers from other countries criticize the Americans for releasing their results early. These critics feel that recruiting volunteers now to provide a statistically significant sample will become difficult and the studies will never get the answers they were designed for.
(1) What do you think it means to say: The group of women, who took the drug tamoxifen had 45% fewer cases of breast cancer than a group of women who took a placebo? (We think it means the relative risk is 45%. That is, if a = proportion of those given tamoxifen who got breast cancer and b = proportion of those given placebo who got breast cancer then a/b = .45
(2) Neither the article, nor the official report of the study give the number in the study who were given tamoxifen and the number who were given placebo. Can you estimate these numbers from the information given? If so, how? (We think you can and estimated that 7375 were given tomaxifen and 6013 were given placebo.)
(3) Do you think it is still ethical to conduct long-term tamoxifen trials using placebo?
(4) In an op editorial (The Gazette (Montreal), 21 April 1998,
B3, Sharon Batt) Batt writes that the study reported a 45%
reduction in relative risk for those who took tamoxifen. She
points out that report could have described this as a one percent
reduction in absolute risk: of the women on placebo, about 98%
were still free of breast cancer after an average of four years of
the study, while in the group taking tamoxifen about 99% were free
of the disease when the trial was stopped. She suggests that the
reaction to the absolute risk would have been very different than
to the relative risk. Do you agree? Should the report of the
study have also provided the absolute risk?
Studies show another drug can prevent breast cancer.
The New York Times, 21 April, 1998
Lawrence K. Altman
Preliminary results from two new studies suggest that a new drug, raloxifene, can reduce the relative risk of breast cancer by about the same amount as tamoxifen, that is by about 50%, and appeared not to increase the occurrence of uterine cancer, a side effect of tamoxifen.
These two new studies involved several thousand post-menopausal women and were conducted at a number of medical centers in the United States and Europe. The studies were sponsored by Eli Lilly & Company that sells raloxifene as Evista. This drug has been approved for osteoporosis and the studies were designed to test the drug's effect on bones. However, determinations of the incidence of breast and uterine cancers were so-called secondary end-points.
National Cancer Institute (NCI) director Dr. Richard D. Klausner called the findings "important and encouraging" but stated that a head-to-head study of raloxifene and tamoxifen "is absolutely required" before general recommendations can be made. The NCI has designed such a study, expected to begin later this year.
(1) According to the article, some cancer experts have expressed doubt that researchers will be able to enroll enough women for the projected study to compare raloxifene and tamoxifen. Why are they concerned?
(2) Why might the fact that the incidence of breast cancer and
uterine cancer were "secondary end-points" make these studies less
definitive as an indication of their ability to prevent these
diseases? Shoule we be worried about the fact the these studies
were sponsored by the company that makes the drug?
Games: Goats and cars revisited.
The Independent, 28 March 1998, p14.
Yes, we have found one more piece on the deathless Monty Hall problem! Apparently an earlier article in the Independent describing the problem led to "a considerable postbag." Here is what one reader asked:
Let's freeze the game show at the point when the host asks the contestant if he would like to change his mind. To avoid clutter, let's remove the door already shown to have a goat behind it. As far as the contestant is concerned, the circumstances remain unchanged, i.e. the door initially chosen by him has only one- third of a chance of having the star prize behind it. Now let's bring in a passer-by from the street, show him the remaining two doors, and give him a go at the star prize. The odds for the passer-by cannot but be 50-50! But this is impossible, because the odds must be the same for the passer-by and the initial contestant. Unless probability is a subjective concept, in which case a whole chapter of maths goes down the plughole. Can you help? (I'd like to get some sleep!)
DISCUSSION QUESTION: Can you help?
Post-Script. Mike Olinick has pointed out to us that a critique
of Marilyn vos Savant's solution of this (and other problems posed
in her column) can be found at the "Marilyn is
Survey: graduating is good for your health.
The Boston Globe, 3 April 1998, pA25.
According to the Center for Disease Control, college graduates feel better emotionally and physically than do high school dropouts. Among the factors cited in favor of graduates was their having better jobs, taking better care of themselves and having better access to health care. These conclusions were based on state-by-state telephone interviews, conducted from 1993-1996, in which 431,996 people nationwide were asked how healthy they felt physically and emotionally over a 30-day period.
Overall, it was found that college graduates felt healthy an average of 26 days a month, compared with 23.8 days a month for dropouts. Results varied by state. South Dakota's college graduates came out best in the nation, averaging with graduates' 27.1 days a month of feeling healthy. Dropouts there averaged 23.8 healthy days. Lance Parker of South Dakota's Public Health Department cited the absence of the hassles of urban life as an explanation. By way of contrast, Kentucky fared worst, with college graduates averaging 23.7 days a month, and dropouts 23.1 days. Greg Lawler of the state health department noted that Kentucky has the highest smoking rate in the country, adding that "you could look at just that factor alone and know we're going to have health problems."
The article reports that California, Colorado, Florida, Indiana, Kentucky, Massachusetts, Michigan, Nevada, Oregon and Rhode Island had a lower average number of healthy days than the rest of the country. Connecticut, Georgia, Hawaii, Illinois, Iowa, Kansas, Maine, Maryland, New Jersey, North Carolina, Ohio, Oklahoma, South Dakota and Tennessee had higher-than-average healthy days. The CDC said that the report doesn't explain the differences among states, but that it should help states improve or add to existing programs.
(1) How many days in the last 30 did you feel well? How reliable do think people's memory can be on these figures?
(2) The CDC said the report may underestimate a state's health problems because not everyone has a phone or is willing to confess their habits. How do each of these bias the estimate?
(3) Would you expect Kentuckians to share some of the benefits of rural living reported for South Dakota? What do you make of this?
(4) If the report doesn't explain the differences, how can it be
expected to help states improve?
Supreme Court calls polygraphs unreliable.
The Boston Globe, 1 April 1998, pA1.
John Aloysius Farrell
In an 8-1 vote, the Supreme Court has upheld a 1991 order, signed by President Bush, that banned polygraphs in the US Military Rules of Evidence. The order had been challenged by Edward Scheffer, who was convicted of using illegal drugs and was discharged from the Air Force. At his trial, a military court had refused to accept results from a lie-detector test that supported his claim of innocence.
Writing for the majority, Justice Clarence Thomas stated that "There is simply no consensus that polygraph evidence is reliable. Although the degree of reliability of polygraph evidence may depend upon a variety of identifiable factors, there is simply no way to know in a particular case whether a polygraph examiner's conclusion is accurate." Thomas estimated the reliability of polygraph tests as 50 to 87%. Justice John Paul Stevens, the lone dissenter in the ruling, thought that the tests were reliable 85 to 90% of the time.
Although the present decision relates to military courts, the deliberations were watched with interest by civilian lawyers. The National Association of Criminal Defense Lawyers, and the American Polygraph Association were among those groups who filed friend-of- the-court briefs, urging the Court to allow defendants to introduce lie-detector results in their own defense.
(1) What do you think the percentages cited by Thomas and Stevens refer to? Is Thomas using "accuracy" and "reliability" as synonyms?
(2) Do you think any level of these figures--short of 100%--would make polygraphs acceptable to the Court?
(3) What implications do you see from allowing defense lawyers to
introduce lie-detector results but forbidding prosecutors to
US ranks No. 1 in deaths by gun, CDC finds.
The Boston Globe, 17 April 1998, pA3.
Chelsea J. Carter
A study by the Centers for Disease Control, appearing in the International Journal of Epidemiology, found that in 1994 the US had the highest rate of gun deaths (including murders, suicides and accidents) among the world's 36 wealthiest nations. The US rate was 14.24 per 100,000 people. Japan's rate of 0.05 per 100,000 was the lowest. The US alone accounted for 45% of the 88,649 gun deaths reported overall in the study.
Although the CDC did not speculate on why the rates varied, gun control advocates were quick to blame the easy accessibility of firearms in this country. The National Rifle Association, on the other hand, criticized the study for not considering all causes of violent death. An NRA researcher was quoted as saying "What this shows is the CDC is after guns. They aren't concerned with violence. It's pretending that no homicide exists unless it's related to guns."
(1) The article reports that Japan averages 124 gun-related attacks a year, less that one percent of which are fatal. What do you make of this in light of the figures above?
(2) What fraction of the gun deaths reported here do you think
are homicides? What fraction of homicides in the US do you think
are gun-related? What is the relevance of these two questions to
the points raised above?
Risk of false alarm from mammogram is 50% over decade.
The New York Times, A16
A recent study concluded that women who receive mammograms every year for a decade run a 50-50 chance of receiving a false-positive result.
The study, conducted over a 10-year period, screened evaluations among 2400 women who were 40 to 69 years old. A total of 9762 screening mammograms and 10905 screening breast examinations were performed, for a median of 4 mammograms and 5 clinical breast examinations per woman over a 10 year period. If a women had a mammogram or breast examination that was interpreted either (1) as indeterminate, (2) aroused suspicion of cancer, or (3) prompted recommendation for additional workup and cancer was not diagnosed within the next year, the test was considered to be false- positive.
Of the women who were screened, 28.8% of the women had at least one false positive mammogram, 13.4% had at least one false positive breast examination, and 31.7% had at least one false positive result for either test. The estimated cumulative risk of a false positive result was 49.1% after 10 mammograms and 22.3% after 10 breast examinations.
Source: New England Journal of Medicine 1998; 338: pp 1089-96 Has the study overestimated the frequency of false positive results? An editorial in NEJM accompanying the study written by Harold Sox points out that an abnormal result would become false positive by default if a diagnosis was not made within a year. An error could occur if a patient with breast cancer did not undergo a definitive diagnostic procedure within one year.
Could the study have underestimated the risks of a false positive? In another article (Study warns of mammogram false alarms, Los Angeles Times, 16 April 1998, A1) we read that in the study, 6.5% of mammograms showed an abnormality whereas nationally about 10% of mammograms show an abnormality.
The editorial by Sox states: "Periodic screening invites repeated exposure to the possibility of a false-positive result." It also mentions that "the way to reduce the rate of false positive screening results is to raise the threshold for calling results abnormal. Doing so almost always means a higher rate of false negative results."
The article in the Los Angeles Times similarly reports that "False positives are more common among women in their forties, and the incidence declined in each subsequent decade of life."
(1) We read in an article in the Boston Globe (Breast cancer
'false positive' rates survey, 16 April 1998, A1, Dolores Kong)
that a critic of the study, Dr. Daniel Kopans, suggests that the
"numbers are a little funny." He questions the figure of a 50%
risk of a false positive for women who have had 10 mammograms,
pointing out that none of the 2400 women in the study actually
underwent 10 mammograms. He feel that this extrapolated estimate
should have been dropped from the study. What do you think about
using extrapolated figures to explain these kind of results?
Mutual funds report; slice, dice, and scrutinize.
The New York Times, 5 April, 1998, Section 3, p45
This article contains descriptions of some of the measures used by the Wall Street crowd in an attempt to simultaneously maximize returns and minimize risk (impossible, of course, to do). One way to measure risk is to compute the standard deviation of the average annual return of a stock(or mutual fund) over the past n years. For example, if n = 10, and if, over the past 10 years, the stock has had an average annual return of 15%, then the standard deviation would measure the degree to which the 10 annual returns deviated from this average. We note here for later use the fact that this measure depends upon past performance.
Lipper Analytical Services, a mutual fund research firm, has developed a new technique, called the 'performance range,' for measuring the risk of a stock. This technique consists of considering the rates of return of the stock over each of the 58 (overlapping) three-month periods during the past 5 years. The performance range is the difference between the best and the worst of these rates of return; the bigger the performance range, the more volatile the stock. Note that once again, this number depends upon past performance.
Now things get complicated. There are two other quantities that have been used for a long time to measure performance. The beta of a stock is the result of comparing the stock price to a relevant benchmark index. If a stock typically rises (and falls) by a greater percentage than the index, the beta of this stock is greater than 1, otherwise it is less than 1. The numerical value of beta is thus the ratio of the average percentage change of the stock and the average percentage change of the index, over some period of time. Again, beta depends upon past performance.
If a stock has a beta of 0.5, say, then the stock is "half as risky" as the corresponding index. Although the phrase "half as risky" has little meeting to this reviewer, it is taken to mean that, for example, if the index goes up 20%, then one might expect that stock to rise 10%. Now suppose that in this example, the stock actually goes up 15%. Then the stock has out-performed its expected return by 5%, so this stock is given an alpha (yet another measure of performance) of 5.
But wait a minute, you say: if the stock really goes up 15%, and the index goes up 20%, then a measure of its beta over the last year is really .75, and its alpha (of course) is then 0. Confusing, isn't it?
The basic point to be made here is that, although stock brokerage houses and mutual fund companies are fond of saying (and are required by law to do so) that "past performance is no indication of future returns", nevertheless, all of these measurements are based upon past performance. If one wants to"manage risk," then one buys stocks (or mutual funds) with low betas, under the assumption that the beta for that stock will be relatively constant in the future. However, one could argue that if one is investing for the long haul(something everyone will tell you is the correct way to invest) then one should invest in those stocks that have the highest rates of return, based upon the seemingly equally good assumption that these returns will do better than the average return in the future, just as they have in the past.
Wow, I (Charles Grinstead) feel much better having gotten that off of my chest! To quote from the latest Vanguard/Wellington Annual Report, "When the rewards of investing are so visible, there is a danger that investors will discount the risks inherent to financial markets. Suffice it to say that investing is not a one- way street, a fact that the markets will make clear from time to time. Nonetheless, the greatest risk is not investing in the first place." Right!
(1) It would be interesting to know if any of these measurements(performance range, beta, alpha) are at all stable over time. This would make a nice question for someone who has access to historical stock prices to pursue.
(2) What would you say about a measurement, such as the performance range, if it changes quite a bit over time?
(3) There are other ways to smooth out fluctuations in the price of a stock, for the purposes of making predictions. Perhaps the easiest way is to compute a moving 5-year average of the price. Can you think of any reason to prefer the performance range measurement to this moving average?
(4) Does the "random walk hypothesis" for stocks require us to
believe that the standard deviation for a stock does not change
From our readers:
Keven Jones wrote about a bar graph provided by the U.S. Department of Justice at their Drug Enforcement Administration (DEA) web site
This is a bar graph with bars labeled from 1992 to 1996. The heights of the bars increase from 10% in 1992 to almost 80% in 1996.Keven writes:
I find the graphic above misleading, though it is captioned where you can figure out that the situation is not perhaps as dire as DEA would like to have you believe from looking at the graphic. I cannot help but think this was a deliberate attempt to scare (most) people into thinking that in 1996 almost 80% of eighth graders had used illicit drugs, instead of an 80% increase from the 1991 level (i.e., the level almost doubled).<<<========<<
A mail poll is conducted. let the first question be: number of children. the responses from families with 2 children are lumped together the responses from families with 0, 1, 3,4, ... children are thrown out (put in a separate pile). Let the next question of the poll be: are there any boys in your family? Keep the forms that reply yes. Of this remaining pile, 1/3 will have 2 boys.DISCUSSION QUESTIONS:
(1) Do you agree that this is a good example for the classical 1/3 answer?
(2) Steven goes on to ask
If we didn't throw away the first pile, but simply kept all those that answered "yes" to "are there any boys in your family?", what fraction would have 2 boys?. It seems we would have to know something about the distribution of family sizes, or children number. What is the shape of this distribution? Suppose its Poisson with a mean of 1.5 (or whatever the mean is) This seems like an interesting problem.
Any thoughts on this?
Editors comment: We reported on an experiment of this type in Chance News
6.12. Will Lassek wrote a letter to Marilyn vos Savant reporting that
the Census Bureau from 1987 to 1992 interviewed 342,018 households in
its yearly National Interview (random sample) Survey. From this data he
estimated that 34% of families with two children who had at least one
boy had two boys. This data would also provide the information needed
for Steven's problem (2) -- a good project for someone.
Copyright (C) 1998 Laurie Snell
This work is freely redistributable under the terms of the
GNU General Public License as published
by the Free Software Foundation.
This work comes with ABSOLUTELY NO WARRANTY.