CHANCE News 5.04
(28 Feb. 1996 to 28 March, 1996)
Prepared by J. Laurie Snell, with help from William Peterson, Fuxing Hou, Ma.Katrina
Munoz Dy, and Joan Snell, as part of the CHANCE Course Project supported by the National
Please send comments and suggestions for articles to email@example.com.
Back issues of Chance News and other materials for teaching a
CHANCE course are available from the Chance web site:
Chance favors the prepared mind.
from the movie: "Under siege part 2"
About Bill Massey's quote ("when you listen to popcorn pop are you hearing the central
limit theorem?"): Erol Pekoz remarked "I think that you are really hearing the Glivenko-Cantelli
Theorem." (The empirical distribution is a good approximations to the sampling distribution when the number of samples is large.).
In a first, 2000 census is to use sampling.
The New York Times, 29 February 1996, A16
Steven A. Holmes
For the year 2000 census, the Census Bureau will not attempt the traditional complete
enumeration of the population but rather will incorporate sampling into the enumeration.
They plan to obtain information from about 90% of the households from interviews and questionnaires that can be returned by mail, telephone, and possibly even the
web. Then a 10% sample will be used to estimate the remaining population. It is
stated that this will decrease the projected cost of the 2000 census from 4.8 billion
to 3.9 billion and could result in a more accurate count.
Some are concerned that sampling violates the constitution's requirement for an "actual
enumeration". Here is what the constitution says:
Representatives and direct taxes shall be apportioned
among the several states which may be included within
this union, according to their respective numbers,
which shall be determined by adding to the whole number
of free persons, including those bound to service for a
term of years, and excluding Indians not taxed, three
fifths of all other Persons. The actual Enumeration
shall be made within three years after the first
meeting of the Congress of the United States, and
within every subsequent term of ten years, in such
manner as they shall by law direct.
It is well known that attempts at complete enumeration lead to an undercount of the
population. This undercount is more severe for certain groups, including minorities.
This is an important issue since the census count of the population can change representation in congress, and the amount of federal money available to a state.
In the last census, after the official census was presented the Census Bureau estimated
the undercount and arguments were made to adjust the official count taking into account
the undercount estimate. After much debate, the government decided not to do this. This decision was challenged in the courts by states adversely affected by it.
The government's decision has, just recently, been supported by a ruling of the
To avoid this problem in the year 2000 Census, the Bureau is planning to do all of
the statistical analysis before reporting an integrated final census count by the
deadline needed for forming new districts. The Census Bureau is currently experimenting
with the most efficient method to evaluate the undercount using a practice 1995 census
The are considering two possible methods for estimating the undercount: one is called
the capture-recapture method that they have used in previous census' and the other
is a new method called CensusPlus. A working paper by Tommy Wright in the Census
Bureau suggests the following kind of simulation to compare these two methods..
It is desired to find the number who live in a six-block area A. The census carries
out an initial enumeration in this area such that each person is counted independently
with probability .85. To estimate the true number in this area, a second independent enumeration is made of an area B consisting of 2 blocks chosen at random. More effort
is put into this enumeration so that each person is now counted with probability
The capture-recapture method estimates the true population of the area A by multiplying
the number counted in A by the first enumeration by the factor:
(number counted in B by the second enumeration)/(number counted in B by both enumerations).
The CensusPlus method estimates the number in the area A by multiplying the number
found in A by the first enumeration by the factor:
(number found in B by either the first or second enumeration)/(number found in B by
the first enumeration).
We assume that we know the true number in each of the blocks and carry out a number
of simulations to see which method results in the better estimate for the true population.
Charles Grinstead wrote for us two Mathematica programs Census and Goldfish to do this. Using these programs
we found that the two methods give very similar results suggesting that the decision
which to use will depend upon practical matters relating to actually doing the sampling.
(1) The article actually said "after census-takers contact about 90 percent of
the population, the bureau will use sampling techniques to round out the total".
How would they know when they had 90% of the population? If they did know they had
90% of the population why would they have to do more?
(2) Do you think the constitution permits sampling?
(3) If constitutionality is not an issue, do you think it would be a good idea to
just take a 10% sample of the entire population? What are some of the problems in
(4) The classical application of the capture-recapture method is to count fish in
a lake. You select c fish, tag them, throw them back and then recapture r fish and
count the number t of fish tagged in this recaptured sample. Then estimate the number
in the lake as cr/t What assumptions are made in this model? What could go wrong with
these assumptions? What would be the corresponding things that could go wrong using
the capture-recapture method in the census? Would the same problems occur if the
CensusPlus method were used?
(5) We recently tried in class to use the capture-recapture method to estimate the
number of crackers in a bag of Pepperidge Farm goldfish crackers (about 330). We
used c = 50 and r = 40. When we simulates this 100 times using Grinstead's capture-recapture program the histogram of the estimates was concentrated on a very few numbers
and one estimate was 2000. What is going on here?
Are all those tests really necessary?
Washington Post, 5 March 1996, Z7
Standardized tests in elementary and secondary schools more than doubled between 1960
and 1989 while student enrollments increased by only 15 percent. The increase was
especially large in public schools since these schools want to show legislators that
millions of dollars in education money is being effectively used and also to meet the
federal guidelines of Goals 2000. This article discusses how these tests are used
and some of the benefits from their use and some of the concerns that educators have
about their use Here are some of the experts concerns:
Test results can set unreasonable expectations
for children or diminished expectations, both of
which are not good.
It is not a rare occurrence to find test scores
going up and erroneously infer from that kids are
learning more when in fact they are just doing
better on tests.
Tests are like taking a child's temperature. They
don't provide enough information to know if a child
is sick or not, they are just one piece of a diagnostic.
Children who score high on tests are tracked into
more stimulating classroom environments where they
learn more, which leads to a kind of self-fulfilling
This article was apparently inspired by the following report just released by the
National Academy Press. It did not say much about the report so we will comment on
the report separately.
The use of IQ tests in special education.
Board on testing and assessment.
National Research Council
Edited by Patricia Morrison, Sheldon H. White, and Michael J. Feurer, and published
by the National Academy Press 1996.
The use of IQ tests in decisions about the placement of children in special education
programs has long been controversial. More than 20 years ago, a landmark case,
Larry P. v. Riles, resulted in the prohibition of the use of IQ tests with African
American children in California. At the time of this case, IQ tests were used in determining
if a child should be classified EMR (educably mentally retarded) and placed in separate
special classes. The court found that IQ tests were biased against minorities and that the separate classes for EMR students were dead ends. The issues raised
are still current and at the center of another case now being tried in California.
The U.S. Department of Education requested this study to help in making policy
decisions relating to special education.
The report summarizes the comments of a number of experts in the field of testing
who made presentations at two workshops. Richard Snow described 10 categories of
validity evidence needed for cognitive ability tests. Daniel Reschly looks at data
on the proportion of students classified as mildly disabled. He observed that African American
studies are still over represented but the gap between these students and others
is narrowing. He presents the results of a study that lend support to the hypothesis
that poverty is a plausible explanation for much of this over representation.
Early research on students at the lower tail of the distribution of reading skills
suggested that they could be meaningfully divided into two groups: one with a specific
reading disability identified by having IQ test scores significantly higher than
achievement scores and the other "slow learners" with IQ scores consistent with achievement
scores. Those in the first group are then identified as having a specific learning
disability. Jack Fletcher presented evidence challenging the validity of this "two
group" hypothesis and its use in identifying students with specific reading disability.
The report states that there has been 20 years of special education policies that
encouraged the use of assessment for classification and eligibility decisions in
a process quite separate from instructional planning. However, according to the
report, recent research has shown that assessment can and should be directly linked with instruction.
In addition, research shows that children can be identified early, before school
failure and low achievement become entrenched patterns. The report states that the
challenge is to implement these findings on a larger scale.
Ask Mr. Statistics.
Fortune 18 March, 1996, p. 137
Daniel Seligman, David C. Kaufman
Mr. Statistics discusses computer solitaire which is packaged with Microsoft Windows
and has gotten millions of PC uses hooked. One version you can play is standard solitaire
embellished with payoffs and called vegas solitaire. The player is asked to imagine that he is paying $1 for each card, or $52 per game and is returned $5 for each
card that can be "put away" on the piles your develop starting with the aces. If
you get all the cards played out you makes $208. Unfortunately, this does not happen
very often. In an attempt to see how unfavorable the game is Mr. Statistics played and
recorded 2,100 games. The most frequent outcome was a loss of $27 corresponding
to only 5 cards put out. Mr. Statistics estimated that the house edge was about 11%
corresponding to an average loss of $5.75 corresponding to an expected number of 9.25 cards
played out. He estimated a 1 in 33 chance for getting all the cards out.
Mr. Statistics observes that these are tentative estimates since there are certain
elements of strategy to the game and he is only human. He requested help from ProfNet
and did not find anyone who had analyzed this game.
How do you think a more accurate estimate of the value of this game could be obtained?
A treatment for a cancer risks another.
The New York Times, 21 March, 1996, A18
Girls who survive Hodgkin's disease face an exceptionally high risk of breast cancer
later in life, a study has shown.
More than 90 percent of victims of Hodgkin's disease in childhood are cured when they
get chemotherapy and low-dose radiation. But the treatment itself raises the risk
of cancer in later years.
These statements refer to a study in the current New England Journal of Medicine that
followed up 1,380 children treated for Hodgkin's disease between 1955 and 1986.
There were 88 second cancers. Only 4 would have been expected in this age group.
The risk was highest in those who were older when treated.
In the latest study, doctors estimate that women treated with Hodgkin's disease as
youngsters face a 35 percent risk of developing breast cancer by age 40. By age
45, this may reach 55 percent
An accompanying editorial noted that the cancer has a higher rate of fatality than
the complications of treatment so "maintaining high cure rates remains the highest
priority to the management of childhood Hodgkin's disease."
(1) Norton Starr suggested this article and remarked that the headline is not supported
by the article. Why do you think he says this?
(2) Starr finds other problems with the account of the study as reported in the Times.
What other problems do you see?
(3) Neither the New York Times article nor the original article says anything about
the children who have Hodgkin's disease and are not treated by chemotherapy and low-dose
radiation. Should they have?
Scientific American, April 1996, 104-105
Stewart suggests that the popular game of Monopoly is a fair game based on a Markov
Chain analysis. He looks at a single player's movement around the board as a random
walk on a circle with forty points. The length of a step is determined by the outcome
of the roll of a pair of dice. Then standard Markov Chain theory shows that the long
range probabilities that the player is at any point on the board is 1/40. He concludes
from this that Monopoly can be considered a fair game.
Stewart comments briefly on the fact that the first player is more apt to get properties
like Oriental Avenue or Vermont Avenue but suggests that winnings will get evened
out in the long run.
A much more realistic calculation of the limiting distribution can be found in "Monopoly
as a Markov Process", by Robert B. Ash and Richard L. Bishop (1972), "Mathematics
Magazine" Vol 45, 26-29. These authors make only very minor simplifying assumptions. In their more realistic model, the limiting probabilities are not equal. They observe
that much of the variation in the limiting probabilities is due to the effect of
going to jail. They also find the expected income from holding a group with hotels.
Green should be your favorite color.
(1) Why might it be advantageous to go first in Monopoly? Do you think there is much
of an advantage in going first?
(2) Assume that there are three players of equal abilities. Estimate the probability
that each player wins.
(3) Which limiting probabilities do you think are most affected by going to jail?
Hawking fires a brief tirade against the lottery.
The Daily Telegraph, 4 February, 1996 p. 7.
Hawking writes in "Radio Times" that he thinks that gambling profits are a pretty
sleazy way to raise money even for good causes. The real interest in this article
is the comment that "Statisticians have determined that if you buy a National Lottery
ticket on a Monday you are 2,500 times more likely to die before the Saturday draw than
land the jackpot.
In long running wolf-moose drama wolves recover from disaster.
The New York Times, 19 March, 1996, C1
Isle Royale is an island about 35 miles long and 7 miles wide in the middle of Lake
Superior. I've been on the Island for the last sixty summers, but moose have been
there even longer than I have. In 1949, nine years after Isle Royale became a National
Park, the lake between Isle Royale and Canada was completely frozen and wolves came
across the ice to the Island. Since that time biologists have had a wonderful place
to study a predator-prey relationship.
Biologist Rolf Peterson has been studying this relationship since 1970. The park
is completely closed during the winter except for his annual trip to observe the
state of the wolf and moose herds.
Peterson's studies provide a wealth of data on this predator prey relationship. From
his most recent annual report you can see a graph of the population of wolves and
moose from 1960 to 1994. In his new book "The Wolves of Isle Royale: A Broken Balance",
Peterson remarks that "for the period from 1959 to 1980 the wolf and moose population
appeared to cycle in tandem, with wolves peaking about a decade after moose." In
1980 there were 50 wolves and about 1000 moose. This was followed by a dramatic
drop in the number of wolves, believed to be caused by a disease brought to the Island by
visitors to the park. Recently the number of wolves dropped to as low as 10, and
this led to speculation that the wolves would die out. In addition to the low numbers,
concern about the future of the wolves comes from DNA studies that have verified that all
the wolves on the island have a single recent ancestor.
This winter, when Rolf made his annual seven-week study, he found that the wolves
had increased to 22 and the moose population was reduced to an estimated 1200, about
half the number of the previous year. Like the rest of us, the moose suffered a severe
winter this year. In addition they suffered from a heavy winter tick infection that
caused hair loss and made them more vulnerable to the cold.
Despite the continued concern about inbreeding, the possibility that this predator-prey
relationship will continue has dramatically improved.
(1) How do you think they estimate the number of moose on the Island?
(2) Would you expect the graphs of the moose wolf population through the years to
look like those produced by the classical predator-prey mathematical model.
(3) If the wolves die out, do you think they should be replaced?
New Yorker, January 15, 1996, 42-55
Sperm counts: some experts see a fall, others poor data.
The New York Times, 19 March 1996, C10
The "New Yorker" article is a typical long and very thorough discussion of the apparent
decrease in the quality and amount of sperm produced by men. A number of studies
in the last decade claim to show a dramatic drop in this sperm cont. In addition
this study has shown that the quality of the sperm is also decreasing. The results of
a meta-study were published in the British Medical Journal in 1992 that reviewed
61 papers published between 1938 and 1991. The authors reported that the average
sperm count had declined from 113 million per milliliter to 66 million per milliliter. The randomness
of the movement of the sperm and the number of things that have to go right for success
in fertilization suggest that if these drops are real and continue they will have a serious effect on the fertility rate, to say nothing of the survival of the
The "New Yorker" article discusses the many theories that have been put forth to explain
this drop in sperm count. Some of the more interesting theories have to do with trying
to explain why the sperm counts for the Finnish men have not decreased while those for the neighboring Danish men have. The theory that modern industrialization is
the culprit receives support from the fact that Finland was industrialized much later
than Denmark. A related theory, that the damage is done by synthetic chemicals in
the environment from industries, is being popularized in a new book called "Our Stolen
The impression that one gets from reading the "New Yorker" article and other recent
articles is that the evidence for a dramatic decrease in sperm counts worldwide is
very convincing, with the obvious disaster if it continues. In her article, Gina
Kolata gives us some hope. She states that a significant number of experts have challenged
the methodology of many of the studies showing a decrease in the sperm count. One
of the principle concerns about earlier studies comes from recent studies showing
that, not only is there considerable variation in the average sperm counts between countries,
but even within the same country. For example, the average sperm count in New York
is much higher than in Los Angeles. This would cause problems with some of the previous studies that compared data from one country at one time with data from different
countries at a later time.
The author of the New York Times article evidently had trouble with the scientific
notation, stating that the meta-study had shown that sperm counts had dropped from
1,130,000 per milliliter to 660,000 per milliliter. An astute student noticed that
both these numbers are less than the 20 million per milliliter considered to be a number
difficult to father a child.
(1) How do you think they estimate the number of sperm per milliliter?
(2) Most studies have been carried out by clinics where it is natural to collect
sperm counts -- fertility clinics, sperm banks, etc. Do you see any problems with
(3) What might be confounding factors in a study that compares sperm counts at different
(4) If it could be shown that fertility rates have decreased, would it follow that
sperm counts are down?
(5) How would you design a study to help settle this question?
Evaluation of the military's twenty-year program on psychic spying.
Skeptical Inquirer, March/April 1996, 20(2), 21-23
"An Assessment of the Evidence for Psychic Functioning" by Jessica Utts, available
from her home page on the web.
As discussed in Chance News 4.16, in the early 70's, the CIA supported a program to
see if ESP could help in intelligence gathering. Laboratory studies were done at
the Stanford Research Institute. In addition to this research, psychics were employed
to provide information on targets of interest to the CIA. The program was abandoned by
the CIA in the late 70's but taken over by the Defense Intelligence Agency (DIA)
until 1995 when it was suspended. The DIA studied foreign use of ESP, employed psychics,
and continued laboratory research at SRI and later at the Science Applications International
Corporation (SAIC) in Palo Alto California.
The program was declassified in 1995 to allow an outside evaluation. The evaluation
was carried out by Ray Hyman and Jessica Utts. Hyman is a Psychologist known for
his skepticism of psychic behavior and Utts is a statistician known to support the
reality of psychic behavior. Utts and Hyman concentrated their efforts on experiments carried
out after 1986 that were designed to meet the objections that the National Research
Council and other critics had aimed at previous studies. Hyman describes the typical experiment in the following way:
A remote viewer would be isolated in a secure location. At another location, a sender
would look at a target that had been randomly chosen from a pool of targets. The
targets were usually pictures taken from the "National Geographic" During the sending
period the viewer would describe and draw whatever impressions came to mind After
the session, the viewer's description and a set of five pictures (one of them being
the actual target picture) would be given to a judge. The judge would then decide
which picture was closest to the viewer's description. If the actual target was judged
closer to the description, this was scored as a "hit"
Hyman and Utts agreed that
these experiments seem to have eliminated obvious defects of previous experiments.
They also agreed that the results of the ten best experiments could not reasonably be accounted
for by chance.
These articles by Utts and Hyman explain why, despite this agreement, Hyman remains
a skeptic and Utts a believer.
What would it take to convince you that ESP was a real phenomenon?
Hyman states that, even if one is convinced that the results are statistically significant,
this does not mean there is such a thing as extra sensory perception. What do you
think he means by this?
How safe are Tylenol and Advil? Helping patients sort out risks.
The New York Times, 27 March 1996, C11
Philip J. Hilts
Recent television attacks between Tylenol and Advil have led to confusion about the
risks associated with taking either drug. The risks being debated are small: 1/1000
to 1/100,000 people, depending on the side effect, for Tylenol (acetaminophen) and
Advil (ibuprofen). In addition, the risks seem to be confined to people who regularly
drink high levels of alcohol and take higher-than-recommended doses of pain relievers.
Acetaminophen has been named in several studies in the last eight years as the cause
of sudden liver failure in rare instances. In most of these cases, overdoses of
acetaminophen were taken (above the recommended 4 grams or 8 tablets a day). In
many cases, heavy drinking (6 drinks a day over a long period of time) was also found.
Ibuprofen also has a serious side effect -- it causes stomach irritation. Taken in
larger than recommended doses and in combination with alcohol, it can produce stomach
or intestinal bleeding severe enough to require hospitalization.
Dr. Brian Strom, the author of an editorial on the relative risk of the two drugs
in last December's issue of "The Journal of the American Medical Association", said
that a rough estimate would be that the number of cases of gastrointestinal bleeding
from ibuprofen was 50 to 100 times greater than the number of cases of liver disease caused
by acetaminophen. The reason for this is that the risk of bleeding is 50 to 100
times greater than the risk of liver disease to begin with, and it is known that
use of ibruprofen can double or triple the risk of bleeding. There is insufficient data
to say how much acetaminophen increases the risk of liver disease, but even if it
is assumed that it doubles or triples the risk, the number of cases is still far
lower than the number of cases for bleeding.
Dr. Debra Bowen of the FDA said the risks of the two products should not be compared
since the risks are not the same for everyone. She suggests that those with liver
problems should be more concerned about acetaminophen while those with ulcers and
other gastrointestinal problems should worry about ibuprofen.
Do you agree with Dr. Bowen's comment that risks of the two products should not
be compared since the risks are not the same for everyone?
Neyer's stats class back in session.
ESPNET SportsZone 0318
ESPNET SportsZone 0325
The article examines three factors relating to NCAA tournament performance --coaching
experience, point-guard experience and playing in one's home state.
Neyer gives the following data for first-time tourney coaches from 1986 through 1995.
The "Points For" column is the average number of points scored by the higher seed,
and the "Points Against" column is the average number of points scored by the lower
Games Points Points Differential Effect
-- overall 630 78.7 71.2 7.6 -3.9
First time --
high seed 37 74.8 71.2 3.7 -3.9
First time --
low seed 92 78.8 69.2 9.5 -1.9
Note that the differential for Points for and Points against is low for first year
coaches on high seed teams and high on low seed teams suggesting that first year
coaches teams do not do as well as the group as a whole.
Neyer gives similar data for coaches in their first year at a school. He concludes
that, based on the tournament data, a first time tourney coach has a 2.5-point disadvantage
and a coach's first year at a school in the tourney has a 1.5 point disadvantage. He concludes that you might not want a coach in his first year at a school and
his first tourney. The odds would be against him.
Again using data of the same form, Neyer shows that experience of the point-guard
seems to help up to the Junior year, but then, mysteriously, senior point-guards
did not help as much as junior point-guards. Finally he shows that teams that play
in their home state have about a 4.8 point advantage comparable to that found in studies of
home team advantage.
DISCUSSION QUESTIONS SUGGESTED BY TODD NICK:
(1) What about the interaction between first year coach and first tourney? Is
there interaction? That is, maybe a first year coach in the tourney that is his
first tourney might be an advantage.
(2) Are we losing anything by collapsing the data into the two groups: high seed
(top 8) and low seed (bottom 8)? How about 8th and 9th seeds? Is there really a
difference? This year 3 of the 4 9th seeds won. What are the advantages and disadvantages
of collapsing data?
(3) Might the fact that the best sophomore and junior point-guards usually go pro
early help explain the senior point-guards did not help as much as the junior point-guards?
Also, is there really such a thing as a freshman point-guard?
Fetal heart monitor called not helpful.
The Boston Globe, 7 March 1996, p 3
It has been standard medical practice to electronically monitor fetal heartbeats during
delivery, and to perform Cesarean sections when the monitors detect certain abnormal
patterns. The goal is to reduce the risk of cerebral palsy, a disability resulting from damage to the brain's motor centers. However, in a California study of 156,000
live births, researchers from the National Institute of Neurological Disorders and
Strokes found that this practice does not help prevent cerebral palsy. While some
of the heartbeat abnormalities detected were associated with cerebral palsy, the article
reports that children delivered by Cesarean section did not have a lower frequency
of cerebral palsy than those delivered vaginally.
(1) If it has been standard medical practice to perform C-sections whenever certain
patterns are observed, then what groups do you think are being compared in the last
(2) The article goes on to say that "the researchers also found that in the vast majority
of babies, abnormal heartbeats did not indicate cerebral palsy: 99.8% of babies
in whom such heartbeats were detected, and who were delivered by C-section, did not
develop cerebral palsy." What else would you like to know?
Intelligence: knowns and unknowns.
American Psychologist, 51(2),77-101
Ulric Neiser et. al.
In the Fall of 1994, the book "The Bell Curve" by Herrnstein and Murray discussed
the concept of intelligence, its measurement, and its relation to human behavior
and political decisions. This book led to heated discussions in the press. Believing
that when science and politics are mixed, scientific studies tend to be evaluated in relation
to their political implications rather than to scientific merit, the "American Psychological
Association" appointed a task force to provide an authoritative report on the present state of knowledge on intelligence.
While the report was inspired by "The Bell Curve", the authors make no attempt to
analyze the arguments in this book. Rather they follow their charge: "to prepare
a dispassionate survey of the state of the art: to make clear what has been scientifically
established, what is presently in dispute, and what is still unknown." While less
lively reading than the "The Bell Curve" and its critics, the report is well written
with a minimum of jargon. It provides an admirably balanced view of the present
state of knowledge on the issues that it addresses:
What are the significant conceptualizations of
intelligence at this time?
It is a pity that readers of "The Bell Curve" did not have this article as a reference
to check out some of the claims made in "The Bell Curve".
What do intelligence test scores mean, what do
they predict, and how well do they predict it?
Why do individuals differ in intelligence and
especially in their scores on intelligence tests?
In particular, what are the roles of genetic and
Do various ethnic groups display different patterns
of performance on intelligence tests, and, if so, what
might explain these differences?
What significant scientific issues are presently
An interesting analysis of arguments in "The Bell Curve" by four statisticians, can
be found in "Galton Redux: Eugenics, Intelligence, Race, and Society: A Review of
'The Bell Curve: Intelligence and Class Structure in America'", Journal of the
American Statistical Association, 90(432), 1483-1488 by Devlin, Fienberg, Resnick, and Roeder
and the related article by the same four authors: "Wringing 'The Bell Curve'", Chance
HMO prescription limit found to result in more doctor visits.
The Boston Globe, 20 March 1996, p4
Richard A. Knox
In efforts to control costs, an estimated 3 out of 4 HMOs place limitations on drugs
that their participating physicians can prescribe. A study appearing in the "New
England Journal of Managed Care" asserts that such practices may cost more than they
save, because they ultimately lead to more visits to doctors.
The study followed 13,000 subscribers to 6 unnamed HMOs in locations from New England
to the Southwest. Among patients with arthritis, asthma, high blood pressure and
stomach ulcers, a positive association was found between greater limitations on prescriptions and more doctor visits, emergency room use and hospitalizations. Patients in
the most restrictive plan saw the doctor 83% more often than did those in the plan
without restrictions. Also, the use of generic drugs rather than brand-names was
associated with higher yearly drug costs and more doctors visits.
(1) Comparing the HMO with no prescribing limits to the most restrictive one, researchers
reported that doctors in the latter actually prescribed more than twice as many drugs,
at more than twice the total cost. How can this be?
(2) Critics have pointed out that the $500,000 cost of the study was funded by a branch
of the pharmaceutical industry. Why should we be concerned about this?
'Seeding' prostate cancer away: radioactive implants prove effective.
The Boston Globe, 25 March 1996, p3
An improved procedure for treating prostate cancer is becoming competitive with the
common surgical procedures which often lead to side-effects such as impotence and
incontinence. The new procedure, called "brachytherapy" involves implanting tiny
radioactive seeds throughout the prostate. This was formerly done through an abdominal incision,
placing the seeds by feel; the latest improvement uses needles injected from outside
the body, guiding the seed placement with ultrasound images.
In a study of 320 men with early prostate cancer (confined to the gland), only six
patients experience a recurrence during the 7-year follow-up to the implant treatment.
Only seven patients suffered incontinence following the implant; the impotence
rate was 25 to 30 percent, and increased with age.
According to Dr. Haakon Ragde, director of the Pacific Northwest Cancer Foundation,
surgical removal of the gland results in incontinence in about 20% of patients, and
a "higher rate of impotence," although he could not give an exact figure.
(1) What do you make of the fact that no exact figure could be given for the impotence
(2) The above data indicate about a 10-fold reduction in the incontinence rate. Let's
suppose the impotence rate actually increases with the new procedure. How would
one trade this off against the improvement in incontinence rate?
The Washington Post, Pg. C05, 10 March 1996
Love, Marriage, and the IRS
Morin reports that a significant percentage of Americans are delaying marriage or
speeding up divorce in response to the "marriage tax." This marriage tax is the
amount in federal taxes that married couples pay above what they would have paid
if they were single (it was imposed three years ago).
Economists James Alm and Leslie A. Whittington have found that the probability of
marriage falls and that of divorce rises with an increase in the marriage tax. They
calculate that a 20% reduction in the tax would produce a 1% increase in the number
of marriages. They also found a small, but statistically significant increase in the divorce
rate following an increase in the marriage tax.
In addition, increases in the marriage tax cause some couples to delay the timing
of their marriage from one tax year to another. Economists David Sjoquist and Mary
Beth Walker analyzed four decades of data and found small but statistically significant
shifts in the number of late-year weddings following changes in tax policy.
Economists Daniel Feenberg and Harvey Rosen calculated the cost of marital bliss in
52% of American couples paid an average marriage tax of
38% of American couples received an average subsidy
10% of American couples broke even.
The marriage tax was usually paid by couples in which
both partners worked, while the marriage subsidy was
collected by single-earner households.
Taken together, all married couples paid an average
of $124 in extra taxes in 1994.
The Washington Post,24 March 1996, p. C5
A. Majoring in Money
In the latest issue of "Monthly Labor Review", Daniel Hecker, an economist for the
Federal Bureau of Labor Statistics, computed the annual earnings of everyone who
had graduated from college before 1991 and had a full-time job in 1993. He found
that the least lucrative majors for both men and women are philosophy, religion, and theology.
Women college graduates in mid-career (between the ages of 35 and 44) made an average
of $32,155 a year while men made $43,199 a year. Economists cite sex discrimination,
family and lifestyle choices, and the fact that many women choose or are pushed toward majors that lead to less lucrative careers such as social work, home economics,
and teaching as reasons for this income gap between the sexes.
Morin reports that the top five most lucrative majors for men are engineering, mathematics,
computer science, pharmacy, and physics. For women, the top five most lucrative
majors are economics, engineering, pharmacy, architecture, and computer science.
B. Tips and Smiley Faces
Temple University psychologists, Bruce Rind and Prashant Bordia, have found that penning
a smiley face or scrawling "thank you" on a customer's bill can boost a waiter's
or waitress' earnings.
At an upscale Philadelphia restaurant, a waitress and waiter drew happy faces on the
checks to half their customers before presenting the bills (a total of 89 dining
parties). There was a 19% increase in tips for the waitress, but a slight decrease
in tips for the waiter. Rind and Bordia said that the difference in tips resulted from
the fact that such expressive behaviors are acceptable from females. On the other
hand, when a male server tries to be friendly, customers perceive him as strange.
Does it surprise you that mathematics is ahead of computer science in the top five
most lucrative majors?
Does the explanation for the difference in customer's reaction to men and women drawing
faces seem reasonable to you?
Why does toast always land butter-side down?
Sunday Telegraph, 17 March, 1996, p 4
Robert Matthews has a mission to explain that Murphy's law is not just selective memory
but rather there are good scientific reasons for most of the things you blame on
Murphy's law. He starts his story with the history of the Law which has been traced
to a Captain Ed Murphy of the US Air Force in the late 1940's. He was involved in a
project involving experiments into the effects of rapid deceleration on the human
body. His project director, announced the results at a press conference, joked
with reporters that, if there was a wrong way to do something, Captain Murphy would always
find it. "We call it Murphy's law".
Examples, of phenomenon often attributed to Murphy's law can be found in earlier literature.
For example, James Payn wrote "I had never had a piece of toast Particularly long
and wide But fell upon the sanded floor And always on the buttered side." Matthews points out that simple experiments or calculations will show that the height of
a table is such that a toast sliding off the table will have just enough angular
velocity to turn it over enough to allow result in the butter side coming up. (You
can verify this with a paper back book if you don't have toast handy).
Another example he gives has to do with odd socks. Murphy's law of Odd Socks is that
if an odd sock can be created it will be. If you start with a drawer of 10 complete
pairs and you lose just six socks at random, then it is 100 times more likely that
you will be left with the worst possible outcome -- six odd socks-- than with a drawer
free of odd socks. If you have ten pairs of socks in your drawer and they get mixed
up you will have to rummage through about 30 percent to find one matching pair. If
you can add to examples of Murphy's Law that may be scientifically based you are invited
to write to Robert Matthews at The Sunday Telegraph, 1 Canada Square, Wharf, London
E14AR. If it is easier to send them to me I will pass them on.
(1) Are Robert Mathew's sock odds correct?
(2) Can you give another good example of Murphy's law that can be explained "scientifically"?
(3) Murphy's supermarket law says that whatever line you get in another will go faster.
Can this be explained scientifically?
(4) And what about Murphy's bus law that says that every time I wait for a bus going
South the next bus that comes by is going North?
Parade Magazine, 3 March 1996, p 14
Marilyn vos Savant
A reader asks:
"My dad heard this story on the radio. At Duke University, two students had received
A's in chemistry all semester. But on the night before the final exam, they were
partying in another state and didn't get back to Duke until it was over. Their excuse
to the professor was that they had a flat tire, and they asked if they could take a
make-up test. The professor agreed, wrote out a test and sent the two to separate
rooms to take it. The first question (on one side of the paper) was worth 5 points,
and they answered it easily. Then they flipped the paper over and found the second question,
worth 95 points: 'Which tire was it?' What was the probability that both students
would say the same thing? My dad and I think its 1 in 16. Is that right?"
Marilyn says the chances are better--1 in 4. This is correct if we assume the students
were lying about the flat and each now independently guesses a tire at random. Marilyn's
solution notes that, for each possible choice by the first student, the second student has a 1/4 chance of matching. This implicitly invokes the Law of Total Probability.
The following is an equivalent, but more direct approach. The answer is the sum
of the probabilities of four (disjoint) events: both guess front right, both guess front left, both guess rear right, both guess rear left. Each of these has the
1/16 chance the reader apparently had in mind; summing them gives 1/4.
Editor: This story appeared in the April 4, 1994 edition of the San Francisco Chronicle
in an article written by Jon Carrol. He said he got it from jimklent aol.com who
said he heard it from a former student who in turn heard it from a good friend who
swears that it is true. The professor involved was named Bonk, and so I sent an e-mail
message to a Professor Bonk at Duke and got the following reply.
The story is part truth and part urban legend. It is based on a real incident
and I am the person who was involved. However, it happened so long ago that I do
not remember the exact details anymore. I am sure that it has been embellished to
make it more interesting.
Professor Bonk included a related e-mail message he had received from Professor Roger
Koppl, an economist in the J. Silberman College of Business Administration at Fairleigh
Dickinson University. The survey referred to in this note was sent to faculty in
FDU's Becton College.
Nine people answered my query. I thank them all. Now, what was I up to? When I read
the story of Professor Bonk I thought immediately of the right front tire. I was
then reminded of something economists call a "Schelling point," after the Harvard
economist Thomas Schelling. Schelling had the insight that certain places, numbers, ratios,
and so on are more prominent in our minds than others. He asked people to say where
they would go to meet someone if they were told (and knew the other was told) only
the time and that it would be somewhere in New York. Most chose Grand Central Station.
How to divide a prize? 50-50. And so on. The existence of these prominent places
and numbers and such permit us to coordinate our actions in contexts where a more
"pure" and "formal" rationality would fail. These prominent things are called "Schelling
points." It turns out that "right front" was indeed the most popular answer.
Of the nine respondents, seven named a tire. I had a tire in mind too. Let's think
of the sample size, then, as eight. Here is the distribution:
number per cent
Right Front: 5 62.5
Left Front: 0 0
Right Rear: 2 25
Left Rear: 1 12.5
If these percentages held for the whole population, then the probability that the
two students would give the same answer would be about .47. This is considerable
greater than the probability of .25 that would hold if each tire scored 25%. The
expected value of each student's grade on the final would be 45, not 29. My calculations are
shown in the PS.
Thanks again to all who indulged me. I enjoyed testing to see if "right front" is
a Schelling point.
To: J. Laurie Snell:
If you are considering the Bonk story, you might like my own small contribution, which
I picked up from my local tire salesman.: that is, the most likely tire to by punctured
is the right rear one. That is because road debris tends to accumulate at the gutter side of the road. The front tire is usually not damaged when it runs over something
like a nail, because nails normally lie flat on the pavement. However, the front
tire may kick up the nail, so that before the nail has time to fall back, it is caught by the right rear tire. My salesman tells me that he used to enjoy "psychically"
telling customers which tire had been punctured, until he grew tired of this game
for lack of sport. If both students in Dr. Bonk's story were really road-knowledgeable,
they might both guess right rear, which would give them a win in the make-up test.
--Paul S. Boyer
Department of Chemistry & Geology
Dear Prof. Snell,
Prof. Koppl told me that you were interested in his result on Prof. Bonk story. I
don't know if you are still interested but I also told the story to my students
and ran an informal experiment. In three undergraduate classes, the results were
out of 16 students, 8 chose the right front tire
In all 3 classes the right front (not the driver side) was was systematically the
"most popular" tire.
out of 24 " 11 " " " " "
out of 20 " 12 " " " " "
I hope it helps! Regards,
Why such a different result in the two polls?
Send comments and suggestions to firstname.lastname@example.org
CHANCE News 5.04
(28 Feb. 1996 to 28 March, 1996)