CHANCE News 9.07
June 7, 2000 to July 2, 2000
Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead,
with help from Fuxing Hou and Joan Snell.
Please send comments and suggestions for articles to
Back issues of Chance News and other materials for teaching a
Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License
(so-called 'copyleft'). See the end of the newsletter for
Chance News is best read with Courier 12pt font and 6.5" margin.
He uses statistics like a drunken man uses a
lamp post, more for support than illumination.
Contents of Chance News 9.07
Note: If you would like to have a CD-ROM of the Chance Lectures
that are available on the Chance web site, send a request to
firstname.lastname@example.org with the address where it should be sent.
There is no charge. If you have requested this CD-ROM and it has
not come, please write us again.
May 2000 Vol 27 #9 RSS News:
People living around the town centre [Luton] are at
the bottom of the poverty-induced ill-health pile,
statistics released on Friday, reveal, with some
residents nearly 25 per cent more likely to die
than the national average.
Luton on Sunday
29 November 1998
Women who use HRT for long periods are slightly more
at risk of breast cancer but they do not seem to die
more often from the disease.
Temperatures of around nine centigrade, at a time of the
year when nearly double that is normal, also added to the
problems of the organizers [of a tennis competition].
BBC Ceefax p335
7 March 2000
(1) Why can't we find our own Forsooth items?
(2) Why is the last item a forsooth item? Do you think
it deserves a forsooth!?
In Chance News 9.06 we mentioned Rex Boggs' interesting web site
to assist teachers of the basic statistics courses but did not
give the URL in the e-mail version of Chance News 9.06. Here it is:
In Chance News 9.05 we discussed an ABC report that stated that
84% of those killed by lightning are men. In a discussion
question we asked why women are so much better off? We have
received four suggestions:
(1) Men are taller
(3) Men seem to enjoy risking their lives more than women do.
(4) Men tend to work outside more often.
(1) The last explanation was suggested by Tom Kotsos who helps
Dr. Mary Cooper maintain a lightning web site.
At this web site, under "Lightning Injury Facts" you can learn
which of the many claims about lightning are myths and which are
facts. Under Epidemiology (a. Gender) you will find that the
claim that about 84% of the deaths are males has been documented
by studies in several different countries.
National Climate Data Center. This data set contains
information about lightning injuries and deaths since 1959. This
includes gender and location. The location categories are:
(1) Under trees
(2) In or near water, boating
(4) Under trees on golf course
(5) Farming, construction or near heavy equipment
(6) Out in the open: fields, playgrounds, ballparks, yard, street
(7) Telephone related
(8) Other electronics related: radio, TV
(9) Various other or unknown
Do you think the number in each category will be more or less in
the order of the categories? After looking at the deaths by
location and gender, will we know anything more about why more men
are killed than women?
concert in the Royal Festival Hall, London, in which pianist
Andras Schiff played Bach's Well-Termpered Clavier (Book II). In
her review Maddocks wrote:
Probability Theory no doubt could give a better
equation, but even common sense tells us that if
an (optimistic) five per cent of the audience has
a cold and each of them coughs randomly three times
in a long evening with precious few opportunities to
clear the tubes, we'll be running at two coughs a
minute. Mr Schiff ought to cancel any February
Could probability theory do better than common sense?
In the June 1 issue of John Paulos' ABCNEWS.com column "Whose
Counting" Paulos, discusses the Parrando paradox: two simple coin
tossing games each of which is unfavorable but if, for a
sequence of plays, you randomly choose one of the two games to
In his July 1 column, Math Vs. Miracles, Paulos uses the recent
news about miracles, such as those relating to Mother Drexel and
Fatima, to ask what are miracles and how are they related to
Paulos provides his usual insightful discussion of both of these
topics. To find the June column go to:
Also remember that Vital Stats is a monthly newsletter that also
discusses a number of interesting articles with no overlap with
our own choices. This make us wonder how many interesting articles
we both miss.
Inside the happiness business.
New York, 15 May, 2000
David D. Kirkpatrick
Dr. Robert Goodman
New England Journal of Medicine, 22 June, 2000, 342(25), p. 1902
Editorial by Marcia Angell (Can be read on No free lunch site)
The New York magazine article is a popular account of how drug
companies try to educate doctors about the merits of their
products. It contains a case study of a particular drug Celexa
launched in 1998 to compete with well known anti-depressants
Prozac, Zoloft, and Paxil. Kirkpatrick states that Celexa is the
only one whose market share is increasing (it now has more than
13% of the $6.3 billion market). According to the article:
The reason for Celexa's stunning success is not science
but marketing. Drug-industry consultants Scott-Levin
say U.S. pharmaceutical companies spent about $10 billion
last year on drug promotions. Most of that--$9 billion --
went toward marketing to doctors (about $12,000 for each
doctor in the U.S.). Drug makers command an army of more
than 68,000 sales people, one for every eleven doctors in
The rest of the article explains in more detail how this army of
sales people try to influence the choice of doctors. This
includes wining and dining at the fanciest restaurants, free pens,
coffee mugs, stethoscopes, textbooks for medical students, and
samples of their drugs. Also drug companies sponsor seminars on
topics relating to their drugs, invite doctors to serve on
advisory boards for new drugs, and support their research.
Of course, all this is just free-enterprise at work. The real
question is, are the doctors unduly influenced by the perks they
get in prescribing medicine for their patients. Dr. Goodman is
one doctor who feels the answer is a resounding YES! He has
established the "no free lunch" web site to try to convince his
fellow doctors not to accept free lunches. He reviews the evidence
that doctors are unduly influenced in his "Free lunch slide
presentation" that you will find at the bottom of his homepage.
We recommend that you view this presentation. You will need to
know what a formulary is. Here is how The California Internet
Formulary defines formulary.
A formulary is a list of prescription drugs that a health
plan has approved for use by doctors. Health plans that have
formularies develop their own unique list of "approved
drugs." Formularies may change at any time.
Health plans may only pay for medications that are on this
"approved" list, unless your doctor goes through the health
plans Prior Authorization process.
Goodman refers to his slide presentation to a study "Physicians'
behavior and their interactions with drug companies" by Cren and
Landefeld, JAMA, 271(9), 2 March 1994, pp 684-689.
In this study the researchers compared two groups of physicians at
the University Hospitals of Cleveland: one was the 40 physicians
who, in a one year period, had requested that drugs be added to a
hospital formulary and the other a group of 80 chosen randomly
from a group of 330 who had not made such requests. From the slide
show we learn that:
Physicians who had requested formulary changes were
more likely to have accepted money from drug companies
to attend or speak at symposia (OR=5.1, 95% CI, 2-13.2)
Physicians were more likely to have requested additions
of drugs made by companies with whose reps they had met
(OR = 4.9, 95% CLI, 3.2-7.4)
Another study "The effects of pharmaceutical firm enticements on
physician prescribing patterns" by Orlowski and Chest, JAMA, July
1992,102(1) 270-273, showed that all-expense paid trips to
luxurious resorts changed physicians' patterns for prescribing
drugs even though they believed these trips had no effect.
The pharmaceutical industry has also come under attack for the
high price of their drugs and their inability to provide
affordable drugs for developing countries. The companies claim
that they have to spend years developing a new drug, it is a risky
business, and their sole responsibility is to their shareholders.
In an editorial, outgoing editor-in-chief of the New England
Journal of Medicine, Marcie Angell, challenges these explanations.
Angell points out that in many cases the risk is removed by the
fact that the government pays for the early research which
determines the drugs viability. She comments that the fact that
over the past ten years the pharmaceutical industry has been by
far the most profitable industry in the United Sates does not
suggest that it is a risky business.
(1) Of course there is a positive side to the interaction of the
doctors with the reps: it is important to learn about new drugs,
the samples can be given to poor patients who cannot afford the
drugs etc. Do you think these compensate for the possible bias
(2) What do you think would happen to a judge who accepted fancy
dinners from representatives of a company involved in a case
before the judge?
Drinkers can raise a toast to new study. Alcohol may reduce
Alzheimer's risk, but moderation's the key.
Chicago Tribune, 11 June, 2000, Sec. 1 p. 1
Effects of smoking, alcohol and APOE genotype on Alzheimer's
Alzheimer's Reports, Vol. 3 Issue 2, 2000
Cupples, et. al.
Numerous studies have suggested that moderate drinking can help
prevent heart attacks. See, for example, Chance News 4.16. The
Chicago Tribune discusses a study, reported in the journal
Alzheimer's Reports, that suggests that alcohol can also help
prevent Alzheimer's disease. The study also considered the effect
of smoking since previous studies have suggested that smoking
might prevent Alzheimer's disease. This new study did not support
The study was a case-control or retrospective study. Such a study
is designed to see if some previous behavior or characteristic is
a risk factor for or helps prevent a disease. A case-control study
chooses a group that has the disease (cases) and a group that does
not have it (controls) and compares the proportion in each group
that has the behavior or characteristic being studied.
The cases in this study were 238 Boston-area patients who had
Alzheimer's disease and the controls were 699 people obtained by
attempting to match the 238 cases with 3 subjects of the same
gender and age. The controls were chosen from the ongoing
Framingham Study. Subjects were classified according to their
consumption of alcohol into three groups: low, moderate, high.
Men who drank less than .25 drinks per day were classified as low,
more than .25 but at most 2 as moderate, and more than 2 as high.
For women, less than .25 were classified as low, .25 up to as much
as 1 as moderate, and more than 1 as high.
The Tribune article presented two bar graphs that gave the
percentage of low, moderate, and high persons in the case and in
the control groups. Knowing these numbers and the totals in each
group provides the following table.
low 116 (48.7) 257 (36.8)
moderate 87 (36.5) 282 (40.3)
high 35 (14.8) 160 (22.9)
total 238 699
Under the bar graphs, the conclusions were stated as: moderate
drinkers were 60 percent as likely to develop Alzheimer's disease
as non-drinkers and heavy drinkers were 50 percent as likely to
develop Alzheimer's as non-drinkers.
These conclusions are expressed in terms of relative risk. For
example the 60 percent should be the ratio of the number of
moderate drinkers in the population whom we expect to get
Alzheimer's disease to the number of low (considered non-drinkers
here) we expect to get Alzheimer's disease. If we used the
information in the table we would
estimate that the chance that
a moderate drinker would get Alzheimer's is 87/(87+282) = .236
and for a non-drinker it is 116/(116+257)
= .311. This would give
a relative risk of .236/.311 = .758. But this is not a correct
estimate for the population as a whole, since the number of
cases and controls were determined by the researchers. If
they had used more controls their estimate would be lower
just because they had more controls! Thus Tom Falcone, and
we also, could not see where the 60 percent and the 50 percent
relative risks came from. This led to our trying to solve
It turns out that epidemiology does tell us how to estimate
relative rates from data from case-control studies. Jerome
Cornfield was the first to tell us how in his paper: "A method of
estimating comparative rates from clinical data", Journal of the
National Cancer Society 11:1269,1951. We follow his approach in
our explanation of how this is done.
Let X be the proportion of the population that has Alzheimer's
disease during a specific period of time. Assume that the
distribution of our three categories of drinkers that we found in
the case-control study is representative of this population. Then
we can summarize the relevant data for the general population as
has Alzheimer's does not have Alzheimer's
low .487X .368(1-X)
moderate .365X .403(1-X)
high .148X .229(1-X)
From this, we can find the proportion of the population in each
category of drinkers that has Alzheimer's disease. These
Low .487X/(.368 + .119X)
Middle .365X/(.403 + .038X)
High .148X/(.229 + .009X)
Thus if we know X we can obtain the desired relative risks for
those in each of the categories of drinkers.
If we do not know X, but can assume that Alzheimer's is a
relatively rare disease, so X is small, we can approximate these
Low .487X/.368 = 1.323X
Middle .365X/.403 = .906X
High .148X/.229 = .646X
Now the relative risk for Alzheimer's disease, choosing low as our
reference, becomes is
Low = 1
Middle = .906/1.323 = .68
High = .646/1.323 = .49
and is independent of X. Thus we end up with approximations to the
relative risk that uses only information obtained from our
original case-control study. The two assumptions that we used to
do this were that the disease is relatively rare and that our case
control study gives us reasonable information about the proportion
in each category of drinkers for those who have the disease and
those who do not.
Thus we can estimate that moderate drinkers in the population are
68% as likely to develop Alzheimer's disease as non-drinkers and
heavy drinkers 49 percent as likely as non-drinkers. This is
close to the conclusions given in the Tribune but the percentages
are not quite the same. In fact our percentages are called crude
estimates by the researchers. The percentages 60 and 50 given in
the Tribune article are the result of carrying out a more
sophisticated analysis which controls for some of the possible
confounding factors and this solves our mystery!
(1) Given that the study suggested that heavy drinking gave the
best protection against Alzheimer's disease, why do you think the
headline of the article included: but moderation is the key?
(2) It is estimated that about 4 million people have Alzheimer's
disease so it is rare in the population as a whole. However, this
increases to between one and two people in 100 at age 65, and one
in five by age 80. How does this affect our assumption that it is
a rare disease?
(3) The article quotes one expert as saying:
This is a good epidemiological study. But like
many epidemiological studies, it raises more
questions than it answers, and that's exactly
what epidemiological studies do. They raise
questions. They don't give answers.
What questions do you think he had in mind? Do you think the
authors would agree with this assessment of epidemiological
studies? Do you?
courts void execution more than two-thirds of the time. Results
fuel debate over capital punishment.
Los Angeles Times, 12 June 2000, A1
In 1972, the US Supreme Court declared all existing death penalty
statutes unconstitutional, and the prisoners then on death row
were spared execution. Since that time, 34 states have enacted
new death penalty laws. The results of all death-penalty cases
from 1973 to 1995 are reviewed in a newly released study headed by
Columbia University law professor James Liebman. During that
period, the death penalty was imposed in 5760 cases. The study
examined all 4578 cases that have gone through the full appeals
process, which includes direct appeals of the trial record in
state court, and post-conviction reviews at the state and federal
level. Overall within this group, 68% of the death sentences were
overturned. The full report from the study is available from the web.
The researchers conclude that American capital sentences are
"systematically fraught with error that seriously undermines their
reliability." The principal errors identified involve
"egregiously incompetent" defense or unethical prosecution. In
many cases, defense lawyers failed to find and present relevant
evidence that might point to innocence or at least mitigate the
sentence. (There are anecdotal stories of lawyers showing up
drunk or sleeping at the trial.) In other cases, police or
prosecutors were aware of mitigating evidence but failed to
properly disclose it. When the cases were retried after the
reversals, 82% resulted in sentences less than the death penalty,
and another 7% ended with the defendant found not guilty.
Many observers find these results troubling. Senator Patrick
Leahy of Vermont says "There should be zero tolerance for
mistakes, not a 60%-70% tolerance. You certainly could not run a
public utility or an airline or a hospital that way."
Death penalty advocates see things differently. The L.A. Times
quotes University of Utah law professor Paul Cassell as saying:
"You will find more reversals of capital sentences...because they
are reviewed more closely. In some ways, this confirms that the
system is working as it should." In a recent letter to the editor
(We're not executing the innocent, The Wall Street Journal, 16
June 2000, A14), Cassell went further, calling the 68% error rate
a "deceptive factoid," since the study did not find a single case
in which an innocent man was executed. He further objects that
the study counts as errors those cases for which the death
sentence was re-imposed at the new trial. He writes:
Under such curious scorekeeping, the report can list 64
Florida post-conviction cases as involving "serious
errors", even though more than one-third of these cases
ultimately resulted in a re-imposed death sentence, and in
not one of the Florida cases did a court ultimately
overturn the murder conviction.
You can find the text of this letter, along with other pro-death
penalty arguments also from the web.
Study author James Liebman responded to Cassell in his own letter
to the editor (Wrong by the margin of a life, The Wall Street
Journal, 23 June 2000, A19). He writes:
Mr. Cassell says "more than a third" of cases retried
in Florida result in a new death sentence. The truth is
29%. In any case, whether it's two-thirds or more than
70% of Florida's cases that were wrong by the margin of a
person's life, does he really believe that's "close
enough for government work?"
...Mr. Cassell thinks it's our job to produce an innocent
person executed by the state. Our study, however, is
about what the courts say, not about the reliability of
American capital verdicts. And over decades and in
dozens of states the courts have found nearly 7 in 10
such verdicts too flawed to carry out. If Amtrak's
trains repeatedly crashed, would we demand a dead body
before doing something about it?
The LA Times article gave further testimony from Liebman that the
system is "broken and wasteful." Over the period of the study,
only 5% of condemned prisoners have been executed, and their
average wait on death row during the appeals process was 9 years.
Liebman sees this as evidence that the system is choking on its
own high error rates. Virginia was the only state that carried
out more than 25% of its death sentences. The state is known for
limiting the appeals process, and its 19% reversal rate was the
lowest found in the study. Liebman attributes this to "low error
detection," while Cassell sees Virginia as a model that should be
emulated by other states.
A perspective from overseas can be found in the British journal
The Economist, which has recently published three articles on the
America's death-penalty lottery, 10 June 2000, 15-16;
Dead man walking out, 10 June 2000, 21-23;
Murder one, 16 June 2000, 33.
The first two predate the release of the Columbia study. The first
laments the "random quality of capital punishment" in America.
Overall, the death penalty is sought in less than 5% of potential
capital cases. However, it is not applied uniformly; for
example, Texas imposes it forty times as often as New York. The
death penalty is sought more often when defense appears weak,
which creates a disadvantage for poor defendants.
In "Dead Man Walking Out" The Economist notes that, while there
have been 640 executions in the US since 1973, 87 prisoners have
been released in light of new evidence. It finds the rate of one
release for every seven executions troubling. Echoing earlier
comments by Senator Leahy and Prof. Liebman, the article observes
that "if an airline crashed once for every seven times it reached
its destination, it would surely be suspended immediately."
Moreover, between 1973 and 1993 an average of 2.5 death row
prisoners a year were found innocent, while from 1993 to 1999 the
rate was 4.6 a year. Noting that Texas has accounted for more
than a third of the 640 executions, the article questions Gov.
George W. Bush's confidence in asserting that all those defendants
were guilty and had full access to the courts. Indeed, last year
Bush vetoed legislation aimed at reforming procedures for poor
defendants. The bill would have formed an independent commission
to appoint public defenders and required representation within 20
The article also points out that American public support for
capital punishment now appears to be eroding. From an historical
high of 80% in favor in 1994, the support had fallen to 66% in a
February Gallup poll. And only 52% favor the death penalty when
life-without-parole is given as an option.
Dr. Frank Newport, Editor-in-Chief of the Gallup Poll, discussed
the trends in a June 21 CNN spot about the Gary Graham case in
Texas. You can view video the clip from the "Gallup on the Air"
links on the Gallup home page.
You can find the historical record of Gallup's findings on the
death penalty also on the Gallup web site.
In the most recent poll, 91% said they believed that sometime in
the last 20 year an innocent prisoner had been executed. And 65%
agreed with the statement that "a poor person is more likely than
a person of average or above average income to receive the death
penalty for the same crime."
The Economist notes that there has never been conclusive proof
that the death penalty actually deters murder. John Lamperti of
Dartmouth has written a review of the relevant literature, which
you can find in the Teaching Aids section of the Chance website.
Finally, the most recent Economist article, "Murder One," reports
on the Columbia study, which it welcomes as the "first real data"
in a debate that has been often framed in ideological or emotional
terms. Summarizing the pervasiveness of errors, the article notes
that 85% of the states which have the death penalty have error
rates exceeding 60%. Virginia is again identified as an outlier:
"For its size it executes five times as many people as other
states and reverses only a quarter the number of cases."
(1) The L.A. Times article reports that "...state courts
initially overturned 47% of the death sentences, having found
serious legal flaws. Later federal review discovered 'serious
error'...in 40% of the remaining cases, resulting in an overall
68% reversal rate nationwide." How was the overall rate computed?
(2) The article later states that "since 1975, 87 inmates have
been freed from death rows across the nation for reasons including
mistaken identification, prosecutorial misconduct or newly
discovered exculpatory evidence, including the results of DNA
tests that have led to eight exonerations." How does this relate
to the 7% figure cited for defendants found not guilty at
(3) Where does the "more than 70%" figure come from in Liebman's
response to Cassell in the Wall Street Journal? What does he mean
in his distinction between "what the courts say" and "the
reliability of American capital verdicts"?
(4) In "Dead Men Walking Out," The Economist found the release
rate per year from death row during 1993-1999 was almost twice the
rate observed during 1973-1993. Does it follow that innocent
defendants are now being sentenced at a higher rate? What else do
you need to know?
(5) What do you think of the analogies that various commentators
have drawn with safety figures for hospitals or travel?
(6) How could you try to determine if the death penalty is a
deterrent? (See Lamperti's article)
the New York Times. Doctor Mitchell Laks is a doctor-
mathematician having also a Harvard PhD in mathematics. Mitchell
contributed to Chance News 6.09 commenting on a baseball article
"The Kind of Sweep That's Hard to Come By." He showed us that
these sweeps might be hard to come by but they could certainly
have come by chance.
Mitchell read an article in the Times about a medical controversy
that led him to go back to the sources to see what it was all
about. This caused him to question the validity of the studies
related to the controversy. He explained his concerns briefly in
a letter to the New England Journal of Medicine:
Volume of Procedures at Transplantation Centers and Mortality
after Liver Transplantation.
Letter to the editor, 18 May 2000,
Mitchell B. Laks
Such letters must, of necessity be quite brief. We asked Mitchell
to tell us the whole story of how he came to write the letter.
Here is his story:
Misunderstandings about the Effects of Volume on Liver
Transplantation Mortality Outcomes
The front page of the New York Times of December 29, 1999 carried
an article "Iowa Turf War Mirrors Battles on Transplants". It
highlighted the struggle of a surgeon, Dr. Maureen Martin, to
establish a new liver transplantation program at a community
hospital in Iowa. It detailed the opposition that she encountered
from the transplantation group at the University of Iowa as well
as from surgeons in nearby Nebraska who felt threatened by the
potential diversion of the limited supply of local organs from
The New York Times article put the story into the context of the
debate in Washington regarding the Clinton administration's
proposed rules for the national distribution of organs according
to "sickest-first" criteria. The intended goal of the new rules is
a broad sharing of organs across state lines. Currently organs are
allocated by geographic criteria. The article reported that this
Clinton proposal is supported by the country's seven largest
transplantation programs. On the other side of the issue is the
Patient Access to Transplantation Coalition, a collection of
several dozen mostly midsize hospitals like the University of Iowa
that have thrived under the current local allocation system. Each
side is actively lobbying their congressmen to support their
respective interests. At stake is income as well as the ability of
hospitals to attract business by projecting a cutting edge image
to their patients.
The small to mid sized transplantation centers are thus being
squeezed on both sides. On the low end they are threatened by
ambitious community hospitals which seek to start new
transplantation programs by hiring young newly trained transplant
surgeons. This threatens to erode the supply of local organs for
transplants. On the high end they are threatened by the large
volume centers like Pittsburgh, and UCLA by virtue of the proposed
national "sickest first" allocation rules. The large centers are
viewed as treating the sickest patients.
A novel argument was raised by Dr. Lawrence Hunsicker, director of
transplantation at the University of Iowa in the debate over the
proposed opening of the new Iowa transplant center. He is quoted
in the New York Times as saying, "Medical research shows patients
fare best in centers that perform at least 20 transplants per
year". He says that the "number of transplants in Iowa is now
limited by the number of usable donated organs, about 40 a year".
"So if we had two programs, and we split evenly, then we would
both be at the very lowest end of the numbers necessary to
maintain competence". Thus, he argues, since 20 liver transplants
per year is the minimum for competence, a second liver transplant
center should not be opened in Iowa.
If one could establish that 20 transplants was the minimal volume
required for a transplantation center to maintain competence this
would indeed provide a good argument against the opening of new
centers, and add scientific support to the goal of small to mid
sized transplant programs in limiting the development of new
programs. On the other hand, the argument does arouse one's
skepticism. After all, even the largest transplantation centers
began initially with small numbers. The argument is an attempt to
create a barrier to entry into the field, and would require
The 12/30/1999 New England Journal article by Edwards et al (1)
from Dr. Hunsicker's group ostensibly is the basis for this
argument. Unfortunately, however, both methodological errors and
possible errors of bias mar significantly the data evaluation and
interpretation in both this paper and a similar previous paper (2)
by the senior author on cardiac transplantation.
At first glance, the study is a typical 'volume outcome' study.
Such studies are done to evaluate whether high volume centers do
better with procedures than low volume centers. The take home
message from a significant volume outcome study is that physicians
should refer their patients for procedures at the highest volume
centers. Volume outcome studies typically are structured to
compare centers with volumes in the top and bottom quartiles for
volume or those in the top and bottom 10%. For example, compare
two recent NEJM studies (3) and (4) on coronary angioplasty
procedures and survival after myocardial infarction respectively.
The recent editorial by Hannan (5) is helpful in putting volume
outcome studies into perspective.
However the authors of both (1) and (2) took a different,
nonstandard, approach. They didn't compare the highest and lowest
volume transplantation centers. Instead they picked a particular
number of transplants per year - in this study 20 liver and in the
previous study 9 heart - and compared the outcomes at centers with
annual volumes more and less than this number. What criteria did
the authors use to pick 20 or 9 transplants as the dividing
points? These are not numbers that one would be likely to select a
priori. For example, for liver transplantation, in the years
discussed in Hunsicker's article (92-93 essentially) there were
centers in the United States doing as many as 300, 250, or 150
transplants per year. If one were to pick a demarcation line a
priori, perhaps one would draw it at 100 per year (2 times per
week) or 50 per year (1 time per week) or perhaps even at 26 per
year (1 every 2 weeks). Why did they choose this unusual boundary
between their two subsets?
Did the authors have any self interest in their choices? The 1994
paper (2) on cardiac transplants reviewed data for the years 1988-
1991. The number of cardiac transplants done during those years at
the University of Iowa were 10,7,10, and 12 respectively, (while
the paper (2) chose 9 as the dividing line). The paper (1) on
liver transplantation covered essentially the years 1992 and 1993,
and the number of liver transplants at the University of Iowa were
9 and 43 (for an average of 26 per year). So by drawing the line
where they did - the authors include the University of Iowa, on
the successful side in both instances. (All transplantation
statistics are as reported by the United Network for Organ Sharing
(UNOS), and are quoted from their web site www.unos.org).
The authors employed a statistical methodology which usually
compares the smallest volume centers and the largest and wrote
their papers to draw a line between very small centers and all the
rest. They included their own center on the better side. In
general, it may be correct or it may be misleading to choose a
particular dividing point. The question becomes - what criteria do
the authors use to divide the centers with poor outcome from those
with better outcome?
For example, the choice the authors made has the effect of linking
centers doing 11-20 transplants with those doing 1-10. This
requires methodological justification. Let us grant the
statistical result of the paper (1), namely that for the years
that they studied there is worse 1 year outcome for centers doing
20 or less transplants. If all the excess mortality resided in
transplant centers doing 1-10 transplants, then presenting the
results in this way would incorrectly characterize those centers
doing 11-20 transplants. For instance, the authors do not present
an analysis comparing centers with 10 or less transplants with
those doing 11 or more. Nor do they report a subgroup analysis
comparing those centers doing 1-10, 11-20 and 21-30 transplants.
The standard methodology of a volume outcome study is not designed
to achieve the kind of discrimination that the authors seek to
elicit from this data. It appears that the authors are not doing a
standard volume outcome study. It would appear that they are
trying to weed out transplantation centers that are smaller in
volume than their own.
How do the authors explain their choice of the numbers 20 and 9 in
their papers? In (1) the authors justify their choice of 20
transplants per year by appealing to their Figure 1. (In the
paper (2) they use their Figure 2). This figure is a scatter plot
of mortality rate versus transplantation center volume. They state
that "mortality rates stabilized at centers that performed more
than 20 transplantations per year and increased inversely with
transplantation volumes of less than 20 per year". Unfortunately,
the authors misinterpret their figure. Such a scatter plot is
expected to show increased scatter for a random phenomenon with
small sample sizes as we consider smaller and smaller sample
sizes. This variability, per se, is not a sign of poor performance
at low volume sites. There is not increased mortality rates, there
is an increase in the variability in the mortality rates, with
wider and wider variance in the rates for smaller and smaller
transplantation volumes. There are sites with mortality rates of
0% as well as of 100%. But this is only to be expected in small
sample sizes! This is a fundamental statistical error. Even if all
the transplant centers in the country, regardless of size, had
equivalent intrinsic 1 year mortality rates, (for the study (1)
approximately 20.7%) the observed mortality rates at small sample
size centers would be expected to show increased variability. The
authors miss the point that a center with small volume can have a
high observed mortality rate simply because they did very few
patients. For example a site with a mortality rate of 100% for 1
transplantation (or 50% for only 4) might not be criticized for
their mortality rate per se - while a site with a 30% mortality
that did 1000 transplants might very well be. A baseball batter
with a lifetime.367 average might strike out in one at bat without
being banished to the minor leagues.
Moreover, we have performed Monte Carlo simulations using those
individual cardiac and liver transplant center volumes and
assuming equivalent intrinsic mortality rates for all centers.
These simulations fully reproduced the behavior observed by the
authors in their figures 1 and 2 respectively. You can see how
beautifully the simulation mimics the data in our simulation
The visible pattern of these scatter plots of individual centers
is thus not directly related to the integrated behavior observed
when the data on centers performing small and larger volumes are
aggregated for volume outcome analysis.
Thus, on mathematical grounds, Figure 1 offers no support to the
authors in their choice of demarcation point between
transplantation centers with good and poor outcome statistics.
Moreover, even if one mistakenly accepted their Figure 1 at face
value, their claim certainly appears debatable. To our eye, for
instance, the graph appears to "stabilize" (no statistical
definition of "stabilize" is offered by authors) beyond 20,
perhaps at 30 transplants per year, in order to include the sites
with mortality rates above 30 and 35 %. Of course, if the authors
had drawn the line of discrimination at 30 transplants per year
that would have included the University of Iowa center on the
short side. Their choice of 20 as demarcation volume number was
thus arbitrary, without a scientific basis, and possibly self
One final point. The authors of (1) give the data for 1 year
survival, but make no mention of the results for 3 month survival.
It is customary for volume outcome studies to offer this short
term data as well (see (3), (4),and (5)). Was this data not
offered because it didn't support the author's thesis? We can't
find this data on the UNOS web site for 1992-93.
We, the readers of (1), are thus left with this message. During
the period 1992-1993, higher volume liver transplantation centers
did better at 1 year survival than did low volume centers. The
precise dividing point utilized by the authors for demarcating
between the high and low volume centers in (1) (or (2)) has no
scientific significance, (for example the poorer 1 year outcomes
reported in (1) may in fact be occurring in those centers with 10
or fewer annual liver transplants but we cannot say). Therefore
physicians should refer their patients to high volume centers.
Of interest to referring physicians may be the data summarized in
Table 1. One can see that there is quite a variation in the
numbers of transplantations performed at different centers in the
US. For believers in volume outcome studies, it is hard to imagine
referring a patient to a center doing 20-40 transplants per year
(just over 1 every 2 weeks) when there are 6 centers doing more
than 100 per year (2 a week), and some as many as 250 per year.
Number of Active US Transplant Centers by Volume Category
100 or more 6 1
75-100 5 2
50-74 14 1
40-49 12 5
30-39 13 12
20-39 18 15
10-19 17 50
0-9 20 51
0 10 18
One can dispute the significance of volume outcome studies in
general. As Hannan (5) points out, they only reflect average
behavior. Moreover, if the statistics are tabulated by transplant
center and not by individual surgeon and team, they may not be the
most appropriate volume indicator - perhaps the volume done by the
individual transplant surgeon and team is more important. It is
probably better to look directly at outcome statistics for centers
and surgeons - thus transplant center volume data may simply be an
easily obtainable, but possibly flawed proxy for more significant
outcome information. Moreover, perhaps not all consumers are
swayed by volume data. I might prefer, for scientific reasons, to
refer a patient to a center doing close to 200 transplants a year,
but some patients may prefer the more personal service that they
may receive at a smaller center closer to home.
The unscientific choice of a particular volume number by the
authors of (1) and (2) should not be allowed to create an
artificial barrier to the development of new transplant centers.
It is a complex undertaking for a medical center to develop a
transplantation team. Potential transplant centers are subject to
stringent regulation, and appropriately so. However, if a
potential center makes the appropriate commitment and provides the
requisite resources of personnel and funding, then, in time, it
may outstrip an older, more established, center in outcome
measures. The older center may lose its sense of mission over
time, its key personnel may develop other primary interests and
retire from the field and the center may fade in quality.
Unfortunately, business considerations such as marketing may also
play a role in the growth of new centers as well. However, in
medicine, as on Wall Street, Adam Smith's proverbial invisible
hand can continue to play a salutary role advancing the cause of
best outcomes as long as free market conditions are permitted to
It is important to guarantee the flow of accurate scientific
information. One responsibility of the medical scientific
community is to serve an equivalent role for medicine as the
Securities and Exchange Commission serves for the securities
industry. We must work to insure that there is full, truthful and
unbiased reporting of medical outcomes so that the government can
exercise appropriate informed regulatory oversight and so that
medical consumers can make the most enlightened choices in the
1. Edwards EB, Roberts JP, McBride MA, Schulak JA, Hunsicker LG.
The effect of the volume of procedures at transplantation centers
on mortality after liver transplantation. N Eng J Med
2. Hosepud JD, Breen TJ, Edwards EB, Daily OP, Hunsicker LG. The
effect of the transplant center volume on cardiac transplant
outcome: a report of the United Network for Organ Sharing
Registry. JAMA 1994;271:1844-9.
3. Jollis JG, Peterson ED, DeLong ER, et al. The relation between
the volume of coronary angioplasty procedures at hospitals
treating Medicare beneficiaries and short term mortality. N Eng J
4. Thiemann DR, Coresh J, Oetgen WJ, Powe NR. The association
between hospital volume and survival after acute myocardial
infarction in elderly patients. N Eng J Med 1999;340:1640-1648
5. Hannan EL. The relation between volume and outcome in health
care. N Eng J Med 1999;340:1677-79
Parade Magazine, 5 June, 2000
Marilyn vos Savant
A reader writes:
These statistics were included in an article titled
"The Rich Get Richer": "Family incomes for the poorest 20%
of the population have lagged behind the gains made by
families in the top 20%. In fact 60% of families have
seen their incomes fall in real terms in the past two
decades." Do you see any flaws in these statistics?
J. J., Severn, Md.
There's plenty wrong with this kind of quintile analysis.
For example, an increase in the income of a household in
any of the lower four income quintiles may increase
averages in other quintiles more than its own quintile.
She then gives an example with ten salaries which go from $10,000
to $100,000 in increments of $10,000 and then presents a scenario
similar to this one: The $10,000 salary is for Paige, a
struggling actress in New York working as a temp. Paige's current
off-Broadway play becomes a hit and moves to Broadway resulting in
her income increasing to $100,000. Marilyn then considers how this
changes the quintiles.
Before Paige After Paige is
is discovered discovered
So the highest-income quintile (and the others) can be
said to have benefited at the expense of the lowest
income quintile. Of course that's a gross misunderstanding
but it's reported routinely.
As you can see, her data does not really show what she wanted it
to show. Paige's good luck has increased the average income of
each of the quartiles by the same amount, $10,000. If we assume
that Paige was such a star that she got and additional $10,000
bonus, then the average income in the top quintile does go up more
that the average of the bottom quintile, or in fact any other
Marilyn then remarks that if the $100,000 was for a physician, who
happens to be an amateur painter and the physician sells one of
his paintings for $100,000, the average income of the top quintile
increases without any change in the lower quintiles.
This topic has been in the news a great deal recently because of
claims by the leaders of the anti-World Trade Organization such as
Lori Wallach who stated:
While the macroeconomic indicators have often looked
good, real wages in many countries have declined, and
wage inequality has increased both within and between
A recent paper "Growth is Good for the Poor", by David Dollar and
Aart Kraay of the World Bank, appears to show that the data does
not support Wallach's claim.
Of particular interest to Wallach's claim is Figure 1 in this
paper. Here we see two scatter plots. The first is for the log
of the average income and the log of the average income of the
poor (lowest quintile) for the countries in the study. The
correlation is .93 and the regression line has slope 1.07. The
second graph gives the scatter plot for the average growth rate
over a period of at least five years for the country as a whole
and the average growth rate for the poor. Here the correlation is
.72 and the slope of the regression line is 1.17. These plots
suggest that the income for the poor does follow that of the
country as a whole.
(1) What problems do you see with examples of the type that
(2) Dollar and Kraay say "We measure mean income as real per
capita GDP at purchasing power parity in 1985 international
dollars." What does all this mean and why do you think they
measure income this way?
(3) What do the results in Figure 1 of the Dollar-Kraay paper
tell you about how the poor do when the economy of a country
changes? What does the fact that the slopes are nearly 1 tell you?
(4) Dollar and Kraay only consider average incomes. What more
might you want to know to decide if, as the rich get riche, the
poor really get richer?
out by mistake.
Rawlings throws open baseball plant doors.
USA Today, 24 May, 2000, 1C
Researchers Say Core Has Changed.
USA Today, 26 May, 2000, 8C
Dan Shaughnessy; Will Juiced Ball Yield Fruit?
Boston Globe, 11 May, 2000, E1
It is obvious to every baseball fan that in the major leagues,
home runs are being hit at a greater rate than ever before. In
fact, this year, records have been set for most home runs in a
day, a week, and a month, and the season is barely two months
old. The leagues are on a pace to hit 6254 home runs, which is
an increase of 13.1% over last year's record of 5528.
Several different theories have been batted about to explain this
pronounced increase. One possible reason is that the ball has
been 'juiced,' i. e. it has been made so that it rebounds at a
greater rate off of a bat than previously. Another possibility
is that with the increase in the number of teams in recent years,
there are many pitchers playing now who are not very good.
Still another is that ball parks are smaller than they used to
be. Finally, the hitters may be bigger and stronger.
The first of these articles describes a tour of the factory in
Costa Rica where all of the balls used in the major leagues are
made. Rawlings owns the factory and has been the sole supplier
of the major leagues since 1977. The cores are made by the
Muscle Shoals Rubber Company, in Batesville, Mississippi. This
company has been the sole supplier of this product for the entire
time that Rawlings has been making the balls (and supplied
Spalding, the previous maker, as well). A spokesperson from
Muscle Shoals is quoted as saying 'We haven't changed a thing.'
Wool yarn and cotton string are wound, by machine, around the
cores. Then, cowhide covers are hand-stitched around the balls.
The hides come from dairy cows, rather than beef cattle, because
the former have fewer imperfections. At the plant where the
cowhides are cut, the plant manager, who has been there for 11
years, is quoted as saying 'We give the very best to the major
Finally, a sample of the balls produced each day is tested at the
factory. A pitching machine is used to launch balls at 85 miles
per hour at a wooden plank made of northern white ash, which is
what bats are made of. The speed of rebound is measured, and
divided by 85 to obtain the Coefficient of Restitution, or COR.
The major leagues have specified that the COR be between 51.4%
and 57.8%. According to Rawlings, the value of the COR has not
changed in the recent past.
The second article reports on a collaboration between Universal
Systems, of Solon, Ohio, and the energy laboratory at Penn State.
Using a CAT scanner designed to test cores in the petroleum
industry, the researchers compared the cores used in baseballs in
the 1930s with the ones being used today. A spokesperson from
Rawlings is quoted as saying that the core hasn't changed over
this period, but the researchers report that the changes are very
significant. They stop short of claiming that this has much
effect on the home run rate.
The last article reports on a study, commissioned by Major League
Baseball, to determine if today's baseballs conform to the
established standards. The study is being conducted by Dr. Jim
Sherwood, a professor of mechanical engineering, and Larry
Fallon, a doctoral candidate, both at the University of
Massachusetts at Lowell. Bud Selig, the Commissioner of
Baseball, claims that the baseball is not being doctored. Selig
is quoted as saying 'We're not that smart.' He thinks that
ballparks are smaller, hitters are stronger, and the umpires
don't call strikes above the belt (even though the rule books
state that balls between the knees and the letters are strikes,
provided they're over the plate).
(1) Imagine a baseball league with 10 teams, say, which have the
250 best players in the world. Now imagine expanding this league
to 20 teams, which now have the 500 best players in the world.
What would happen to the number of home runs per team in a
season? What would happen to the distribution of individual
totals for home runs (i. e. would it be more likely that someone
would hit a large number of home runs)?
(2) Suppose that over time, both hitters and pitchers get
'bigger and stronger' at the same rate. If everything else were
assumed to be unchanging, would the number of home runs go up or
(3) When we asked Dr. Sherwood how they were testing the balls
we received the following answer:
We are not at liberty to disclose our test methods at
this time. This is a comprehensive study being completed
under commission to MLB (Major League Baseball). MLB has
asked us to keep all procedures and results confidential.
The methods and results will be released to the public
when it is appropriate. There are, however, standard
tests for baseballs as published by the ASTM. You are
welcome to consult these standard test procedures.
We note also Dr. Sherwood received a grant $390,000 from
Major League Baseball and Rawlings Sporting Goods for the
establishment of the Baseball Research Center. Would you be
concerned about a possible conflict
NOTE: We received an interesting
explanation from Dave Zavagno about the
way that Universal Systems studies baseballs.
Copyright (c) 2000 Laurie Snell
This work is freely redistributable under the terms of the GNU
General Public License as published by the Free Software
Foundation. This work comes with ABSOLUTELY NO WARRANTY.
CHANCE News 9.07
June 7, 2000 to July 2, 2000