!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 10.07

July 5 , 2001 to August 14, 2001

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.

We are now using a listserv to send out Chance News. You can sign on or off or change your address at this Chance listserv.This listserv is used only for mailing and not for comments on Chance News. We do appreciate comments and suggestions for new articles. Please send these to:

jlsnell@dartmouth.edu

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site.

====================================================
Ronnie's grades were okay in junior high school because his
multiple-choice tests was good -- in Lake Wobegon, the

Garrison Keillor (1997) Wobegon Boy p.180

====================================================

On the SAT, the answer is usually either B or C, sometimes
it is A, but only rarely is it D

Peter Kostelec's high school teacher.

====================================================

<<<========<< >>>>>

==============>
The following items are from the Forsooth column in the May 2001 issue of RSS News.

Top 5 highest vehicle theft rates by city in 2000 (The vehicle theft rate
is the number of stolen vehicles divided by the city's population, then
divided again by 100,000).

 City no. of thefts theft rate Phoenix 29,506 976.1 Miami 20,812 956.6 Detroit 40,685 909.2 Jersey City 4,502 814.4 Tacoma 5,565 807.9

The Insurance Guide

We assume that RSS was worried about the way the rate is described here and in newspaper articles about the study. However, this information was obtained from a study done by the National Insurance Crime Bureau (NIBC) which described the theft rate exactly as the Insurance Guide did both in their study and in their press release. The only confusion comes from how you interpret (thefts/population/100,000). The comma in their definition would suggest they mean (thefts/pop)/100,000 which would be wrong. It actually means thefts/(pop/100,000) = (thefts/pop)*100,000 which could be stated more simply as the number of thefts per 100,000 residents and is what they meant by the theft rate!

DISCUSSION QUESTIONS:

(1) The British define the vehicle theft rate as the number of thefts per 100,000 registered cars rather than per 100,000 population. Which do you think is a better definition? Why? Which would give New York city the higher rating?

(2) The Insurance Journal (Nov. 14, 2000) reports (again based on an NIBC study) that the ten most commonly stolen vehicles in the United States in 1999 were:

1. Honda Accord
2. Toyota Camry
3. Oldsmobile Cutlass
4. Chevrolet Full Size Pickup
5. Honda Civic
6. Toyota Corolla
7. Jeep Cherokee
8. Chevrolet Caprice
9. Ford Taurus
10. Chevrolet Cavalier

The headline in the Augusta Chronicle (30 September 2001) was "thieves prefer the Honda Accord". What other information would you like to have to be convinced of this?
<<<========<<

>>>>>=============>
Since taking over the Probability Web, Bob Dobrow has added three new web pages:

• Quotes, a page of probability-related quotations.
• Teaching Resources, with on-line tutorials and textbooks, interactive demonstrations
and general resources for teaching a probability course.
• What's New, a summary of the most recent submissions.

Under Teaching Resources we found an interesting web page Probability Central . Here, for example, you can practice your ability to estimate the probability of getting a good hand at draw poker. You are given a hand and asked to choose the cards you want to keep. Then the dealer gives you your new cards and tells you the probability of getting your, hopefully, improved hand. This is just one example of an ambitious program, ThinkQuest, that has students worldwide working with their teachers to provide web pages related to all areas of knowledge.

<<<========<<

>>>>>=============>
Commenting on the recent study that the placebo effect is not what it was thought to be (See Chance News 10.06) , John Paulos writes in his  current Who's Counting column:

Although it had nothing to do with politics, the placebo report did get me
thinking of the inclination of some politicians (and of countless other people as well)
to substitute their own will and beliefs for scientific fact. I believe it. Therefore it's true.

Paulos illustrates this with four examples:

• A study claiming to show that homosexuals can change their sexual preference through counseling.
• The confidence that government officials have that every convicted criminal condemned to die in our nation's prisons really is guilty.
• Global warming.
• The evidence that the missile defense program will work.

We recommend reading John's column. It is just a click away. If you don't find it here then try a second click here.
<<<========<<

>>>>>=============>
David Hardman called our attention to the following two articles written by Jordan Ellenberg. Jordan is in the math department at Princeton. He is a number theorist but seems to have inherited an interest in statistics from his parents Susan and Jonas Ellenberg. Susan is at the FDA and a recent president of the International Biometric Society and Jonas is at Westat and a recent president of the ASA. Both gave great lectures at th
e Chance Lectures 2000.

Jordan is writing a semi-regular column called Do the Math for the electronic newspaper Slate. We will give a brief summary of his first two columns but again we recommend reading the real thing which is just one click away.

Cigs and Figs
How do you count dead smokers?
Slate, 14 June, 2001
Jordan Ellenberg

Jordan describes a recent TV ad:

Three teen-agers bungee out of an airplane while chugging their favorite
soda. Two of the boys positively beam with satisfaction. The other boy's
head explodes. Tobacco, the ad concludes is the only product that kills
one in three people who use it.

Jordan comments that this is a very effective ad since most everyone knows three people who smoke. But he wonders what "one in three" means. He gives six possible meanings and comments that there are many more. He then looked for the source of this "one in three." He attributes it to the Centers for Disease Control's Nov.9, 1996 Morbidity and Mortality Weekly Report. Here it is stated that:

Of young American adults who smoke, we project that 55 percent will
become lifetime smokers, and there is a 50 percent chance that they will
suffer a smoking-attributable death. The other 45 percent who will quit
sometime in their adult life have a 10 percent chance of suffering a
smoking-attributable death.

Thus 50% of 55% plus 10% of 45% is 32% or about one in three.

Then Jordan turns to the question: what's a "smoking-related death"? As an example he assumes that studies have shown that the relative risk of smoking for heart disease is 2.5. He then comments that if there are 100,000 deaths among the smokers (over some time period) we would only expect 40,000 deaths if these people were not smokers. Hence 60,000 of the 100,000 deaths of the smokers could be attributed to smoking. Of course you have to do this for all types of death that are associated with smoking. He refers us to Mortality From Smoking in Developed Countries, 1950-2000 (R. Peto, et al) for this information. A detailed explanation of how these estimates are made, and how they are put together to get an overall estimate for the deaths attributed to smoking, can be found in Chapter 3 of the 1989 Surgeon General Report: Reducing the Health Consequences of Smoking. In this report it is concluded that smoking accounted for 22 percent of the deaths in 1985 for men aged 20 years or more and for women, 11 percent of the deaths. More recent estimates can be found from the more recent Surgeon General reports on tobacco. The most recent report is Women and Smoking: A Report of the Surgeon General (2001).

DISCUSSION QUESTION:

What information about smoking related diseases would you need to determine the percentage of deaths that can be attributed to smoking?
<<<========<<

>>>>>=============>
Barry Bonds and the placebo effect.
Slate, 12 July, 2001
Jordan Ellenberg

Like Paulos, Ellenberg starts with the recent study on the placebo effect (See Chance News 10.06) and explains how regression to the mean can account for the placebo effect. Then he turns to baseball and reminds us that Barry Bonds has hit more home runs before the All-Star break than he, or anyone, ever has before. He writes:

But Barry Bonds isn't going to hit 72 home runs for the same reason
that there might be no such thing as the placebo effect.

Jordan provides data giving, for every non-strike year since 1934, the player who has the highest number of home runs at the time of the All-Star game. He gives for each of these players, the number of home runs the player had at the time of the All-Star game and, at the end of the season. He remarks:

The average ratio between the hitters' home runs per game in the second half
and their home runs per game in the first half was approximately two-thirds. So
a reasonable guess--a reasonable statistical guess, that is, taking into account
no knowledge about the properties of baseball--would be that Barry Bonds,
having hit 39 home runs after 88 games, will get 39/88*2/3*74, or about 22
more. Sixty-one home runs is a formidable total but no longer a record in

the present Age of the Dinger.

See Chance News 7.07 for a further discussion of estimating the number of home runs at the end of the season based on the number at the time of All-Star game.

DISCUSSION QUESTION:

How could Jordan have used regression to estimate the number of home runs Barry Bonds would end up with?
<<<========<<

>>>>>=============>
Capital punishment and homicide rates.
The New York Times, Sunday Q&A, July 22, 2001
Raymond Bonner

Sunday Q&A is a weekly feature of the Times, and this brief piece is an answer to the question:

Is there a noticeable difference in homicide rates between states that
have the death penalty and those that do not?

Bonner writes,

The short answer is yes. The homicide rate in states with the death penalty
has been 50 to 100 percent higher than the rate in states without it, a New
York Times study found last year.

This study was discussed in the Times (States with no death penalty share lower homicide rates, New York Times, September 22, 2000, Raymond Bonner and Ford Fessenden) but the study itself does not seem to be available.

Bonner provides other statistics that refute the argument that the death penalty acts as a deterrent to homicide. For example, of the twelve states that do not have the death penalty, 10 have homicide rates below the national average. Also, of the seven states with the lowest homicide rates, five do not have the penalty, and of the 27 states with the highest homicide rates, 25 have the death penalty. The September, 2000, article contains a more complete account of the various arguments--both for and against the death penalty--that are typically proposed.

For a very clear review of research on the relationship between capital punishment and homicide rates, and of the statistical issues involved, see also, "Does Capital Punishment Deter Murder? A brief look at the evidence," by John Lamperti.

DISCUSSION QUESTIONS:

(1) Which of the three statistics provided above (in the middle paragraph) do you think most strongly refutes the argument that capital punishment deters crime?

(2) Lamperti describes research that studied whether the death penalty provides "short-term deterrence" against homicide. Homicides that occurred within 60 days before and 60 days after five highly publicized executions were analyzed and the rates were compared. What do you think the research hypothesis was? What do you think actually happened? (Read Lamperti's paper and find out!)
<<<========<<

>>>>>=============>
How Bush took Florida: mining the overseas absentee vote.
The New York Times, 15 July, 2001
David Barstow and Don Van Natta Jr.

In the aftermath of the presidential election, the New York Times conducted a six-month investigation of the overseas absentee vote in Florida. Of particular interest were the strategies of Democrats and Republicans in regard to these ballots, and the extent to which county canvassing boards were persuaded--mostly by Republicans--to include flawed ballots in their totals.

The Times analyzed 2,490 overseas ballots for irregularities such as no proof that it was mailed by election day (Nov. 7), no postmark, no witness signature, and/or received after the overseas ballot deadline, Nov. 17. The ballots did not indicate the actual vote cast, but researchers could determine the county of residence of the voter. In a companion graphic to the article ("Absentee Ballots: A Catalog of Flaws") there are examples of several ballots that violated Florida's election laws or rules, and for each example there is a chart displaying the distribution of the ballots among voters in counties won by Bush or by Gore. For instance, counties that Bush won received a total of 474 overseas absentee ballots which were mailed after election day or had no postmark, while Gore counties received 282 such ballots. Through efforts by Republicans, many more flawed overseas ballots in Bush counties were accepted than in Gore counties. In the example above, counties won by Bush accepted 294, or 62%, while counties won by Gore accepted 50, or 18%. In all, 680 ballots had one or more flaws that violated Florida statutes or rules.

In order to estimate the effect that inclusion of these ballots had on the outcome of the election, the Times retained Gary King, a professor of Government at Harvard. [There is a brief summary in the Times entitled, "How the Ballots Were Examined".] Although actual presidential choices could not be determined from the ballots, the overseas vote in a few counties was known to be 100% for Bush or for Gore. Thus for these counties at least, the effect of the flawed ballots could be determined. It was also possible to construct a 100% confidence interval for the election outcome in Florida, if the flawed ballots are removed: either Gore won by 126 votes, or Bush won by 936 votes. Clearly, the "actual" outcome is somewhere between these two extremes, and professor King estimated that Gore's chance of winning was around 1%. (See King's comments, below, for more details.)

Kosuke Imai (a grad student here) and I are writing up an academic paper
on the subject, but its not ready yet (when it is, it will be available at my
homepage.) The basic story is that on election day Florida reported that
202 more people voted for Gore than Bush, but after counting the overseas
absentee ballots, it was Bush by 537. The Times reporters found 680 ballots
that should have been disqualified. So we observed the number of bad
absentee ballots by county and the number of absentee ballots that were
cast for Gore by county, and we needed to estimate the number of bad
ballots that had been cast for Gore, also by county.

What we did was to use the methods in my book, A Solution to the Ecological
Inference Problem: Reconstructing Individual Behavior from Aggregate data
(Princeton University Press, 1997, a method proposed in Ori Rosen, Wenxin Jiang,
Gary King, and Martin A. Tanner, "Bayesian and Frequentist Inference for
Ecological Inference: The Case,'' Statistica Neerlandica, in press (preprint
available here ), and several other methods of ecological inference proposed
in the literature.

But we couldn't stop there since we knew that this was about the most
partisan issue in existence. Thus, we felt we needed more than merely a
good specification that we could defend to reasonable people. We needed
something that was somewhat more resistant to choices about model
specification. So we did formal Bayesian model averaging, and included
in the set of models to be averaged every reasonable model we could
think of and every one that we thought anyone else might think of. The
Bayesian model averaging procedure then produced a single estimate
by weighting each of the models based on the evidence in the data
(the marginal likelihoods). It was a good application for model averaging,
we thought, because generalizing the model to encompass the range of
possibilities was infeasible.

Assuming the Times reporters did their job well (and no one seems to
have claimed otherwise), we found that the actual election margin was
somewhere between 126 for Gore and 936 for Bush (this is a 100%
confidence interval, i.e., certainty). That is, we cannot be sure who won
the election. We then estimated the likely margin within this interval. Our
estimate was that if the ballots had been correctly counted Gore would
Gore had only about a 1% chance of having actually won the election.

The NY Times only reported the above estimates, but you might also be
interested in some "What Ifs" from the same analysis: If Katherine Harris
had accepted the 176 ballots that Palm Beach County asked her to accept,
instead of rejecting them because they were 2 hours late, and if the bad
of having won. If, instead of the 176 ballots, we go with 215, which is
what the Democrats say that Palm Beach was ready to forward if Harris
had been amenable, then the probability goes to almost 0.5. Add a traffic
jam or two in the right places on election day and everything would have
been different.
<<<========<<

>>>>>=============>
Guess Where: The Position of Correct Answers in Multiple-Choice.
Seek Whence: Answer Sequences and Their Consequences in Key-Balanced Multiple Choice Tests.
Working papers, Center for Rationality and Decision Theory, Hebrew University June 2001
Yigal Attali and Maya Bar-Hillel

The authors are interested in biases that students have when answering multiple-choice questions and that testers have in determining the positions of the correct answers in the answer keys. They review the literature on this subject and the results of numerous experiments carried out by them and others to understand the effect of these biases. We found the discussion of the effect on college entrance exams especially interesting and will limit our discussion to this aspect of their work.

The authors consider multiple choice exams for college entrance: In the US the SAT exam given by College Board and the ACT exam given by the American College Testing Program and, in Israel, the PET test given by the National Institute of Testing and Evaluation (NITE). Assume that one of these tests has 4 answers a,b,c,d. The authors are interested in the following two questions: When students guess at an answer, do they make a random choice of the four answers and, when test makers decide in which position to put the correct answer, do they make a random choice? The answer to both questions is no.

The authors first consider the question quite generally for multiple choice exams. They show by examples and by experiments that both those taking tests and those making up answer keys tend to favor middle answers for individual questions and to avoid apparent streaks in the position of the answers over the entire test. For example, in a test of 25 questions, each of which has answers a,b,c,d, they would tend to have a higher proportion of b and c answers than a and d. Also they would tend to not have streaks longer than 3, such as 4 correct answers in the same position in a row.

Those making up the college entrance exams have apparently recognized the tendency to choose middle answers and have tried to avoid this by a process called "balancing the test." The authors show that, even after balancing, the SAT exams do not completely avoid the middle bias and give evidence to show that students, when guessing, can take advantage of this to increase their scores by 10 to 16 points. They base this on experiments with the following simple strategy: answer the questions for which you know the answer and count the number of answers in each position. See which position has the fewest correct answers and, for your guesses, give the answer in this position. They call this the Underdog strategy.

The authors ask: why do the testers not simply randomize when choosing the position for the correct answer? They observe that in the testing literature it is often stated that testers should randomize but then go on to give a very specific set of rules such as "no runs greater than 3" which are not consistent with randomization. That the SAT and ACT tests do not randomize is apparently a well kept secret. However, the authors persuaded NITE to use randomizing in the future in their PET exam, and so NITE was willing to tell their previous method of balancing the answer key. The PET exams consist of two quantitative subtests (25 questions), two verbal subtests (26 questions), and two English subtests (30 questions). Their rules for balancing a 25 question test were:

• No position should appear in the subtest key more often than 9 times, or less
often than 4.
• Correct answers should never be placed over three times in a row in the same position.
• A sequence of about half the length of the substest (i.e., about a dozen consecutive items)should not lack one of the 4 positions.

By considering published answer keys for the SAT the authors say that it appears that the College Board uses very similar rules for their balancing.

It is natural to ask why testers avoid genuine randomization for their answer keys. One answer is that students, like most people, do not understand that streaks can and often do happen in random sequences. Thus, if they see that they have chosen answer b for the first 3 questions and believe that the answer to question 4 is also b, they might change their answer, thinking that a streak of 4 is impossible. Of course this example will pose a much more serious problem if, through articles like these, the testers balancing methods become known. Then the student will know that he has to change one of his first 4 answers and waste a lot of time trying to figure out which one to change.

The authors remark that game theory suggests the wisdom of randomizing. They view the test as a game between the student and the tester. The students wants to maximize the chance of getting it right when he guesses and the tester wants to minimize this. The authors do not discuss the game theory approach in detail but we felt that it was a great example for those teaching game theory so we worked it out in some detail. Here is how the game would be set up.

For a single question with four possible answers a,b,c,d. the tester determines where he will place the correct answer and the student either skips the question or chooses one of the possible answers as the correct answer. The payoff matrix depends on whether there is a penalty for a wrong answer. There is no such penalty for the ACT and the PET exam and so, for these tests, the payoff matrix for the student is the rather trivial matrix:

 a b c d skip 0 0 0 0 a 1 0 0 0 b 0 1 0 0 c 0 0 1 0 d 0 0 0 1

For an SAT exam with 4 possible answers a student loses 1/3 of a point for a wrong answer. Thus the payoff matrix such an SAT exam is more interesting and is:

 a b c d skip 0 0 0 0 a 1 -1/3 -1/3 -1/3 b -1/3 1 -1/3 -1/3 c -1/3 -1/3 1 -1/3 d -1/3 -1/3 -1/3 1

In either case, a strategy for the player is a set of probabilities for the possible choices: skip,a,b,c, or d where skip means don't answer the question, a means choose answer a etc. For the tester a strategy provides the probabilities for the placement of the correct answer in position a,b,c, or d. If the tester makes them all equal we say he randomizes.

This is a zero sum game, meaning that whatever one player gains the other loses. For such games, Von Neumann and Morgenstern proved that there is value v in the following sense: There is a strategy for the student that guarantees an expected winning of at least v and a strategy for the tester that can limit the student to an expected winning of v no matter what the student does.

Now for the ACT and PET, the student can guarantee an expected value 1/4 by guessing and the tester can limit the student to this expected value by randomizing. Thus the value of this game is 1/4. Therefore, for the ACT or PET exams, the student should certainly guess rather than skip a question.

Consider now the game for an SAT test with 4 choices. Assume that the tester randomizes and the student guesses. Then each possibility in the game matrix occurs with probability 1/16. For 4 of these sixteen choices the student wins 1 and for the other 12 he loses 1/4. Thus his expected winning is:

4*(1/16) - 12*(1/3)*(1/16) = 0.

It is easy to check that randomizing and guessing are optimal strategies, so the value of this game is 0.

Thus while the ATC and PET encourage guessing the SAT tries to give no reward on average for guessing. It seems that they do this to discourage guessing but if they were really randomizing there would be no reason not to guess and if they were not, as these papers have shown, there would be good reason to guess intelligently.

Standard advise given to students taking the SAT exam is: If, you can rule out one of the four choices you should guess. If for example a student knows that b cannot be the correct answer, the matrix for the game becomes:

 a c d Skip 0 0 0 a 1 -1/3 -1/3 c -1/3 1 -1/3 d -1/3 -1/3 1

If the tester randomizes, the three remaining choices are equally likely. If the student guesses among these his expected winning will be:

3*(1/9) - 6*(-1/3)*(1/9) = 3/9 - 2/19 = 1/9.

Thus it does pay to guess in this case.

The student is really doing a kind of Bayesian analysis here and presumable could use this approach to develop a more sophisticated guessing strategy that took into account his estimate for the probabilities for the position of the correct answer.
<<<========<<

By a vote of 125 to 19, the New York state legislature has outlawed the use of hand-held cellular phones while driving. Three New York counties had previously enacted local bans, and polling results indicated that a substantial majority of New Yorkers favored such legislation. Nevertheless, in response to the concerns of wireless companies, the legislation also called for an analysis of the causes of accidents over the next four years to determine the impact of cell phones.

Despite the apparently strong sentiment against drivers using phones, the article notes that there is not a large body of scientific evidence on the effects of cell phones. A 1997 study in the New England Journal of Medicine concluded that driving with a cell phone elevated the risk of an accident by a factor of four. We discussed the New York Times coverage of this study in Chance News 6.03, and in 6.10 we reported that CHANCE Magazine had an article by the authors of the study, which gives a nice discussion of the "case-crossover design" employed in the study. This article is available on-line from the CHANCE Magazine web site.

Critics of the new law argued that cell phone users were being unfairly singled out for attention. They noted that there are many other sources of distraction at the wheel, including eating, interacting with children, adjusting the radio and applying make-up. An American Automobile Association study (AAA) lent some weight to this argument. It examined reports from 32,302 accidents during the period 1995-1999 and found that only 42 of the accidents could be linked to cell phone use. The study concluded that car radios posed a much larger problem.

In response to the article, Dr. Donald Redelmeier, one of the authors of the NEJM study, wrote a letter to the editor (The New York Times, 28 June 2001, A26) to clarify his findings. He pointed out that the four-fold increase in risk was "above and over the 'baseline' driving risks from listening to the radio, talking to passengers and other usual activity."

For comic relief, you may enjoy the opinions of NPR's Cartalk brothers, Tom and Ray, who truly dislike cell phones. Their collected wisdom is archived on their "Drive Now Talk Later" web page:

You'll find links to scientific evidence including the NEJM and AAA studies. Under "nonscientific evidence" you can read anecdotal reports from Cartalk listeners who have observed silly behavior involving cell phones.

DISCUSSION QUESTIONS:

(1) One might wonder why New York didn't study its own accident data before passing the legislation. What problems might have existed with existing data? Can you anticipate any problems with the data in years to come?

(2) At first glance, it may seem impossible to reconcile the AAA data reported here with the NEJM conclusion. The NEJM study looked at 699 drivers who had cell phones and were involved in collisions resulting in property damage but not personal injury. Billing records were used to compare the drivers' cell phone use on the day of the collision with use during the week prior to the collision. Compare the resulting measure of risk with the AAA analysis.
<<<========<<

>>>>>=============>
Victim poll on violent crime finds 15% drop last year.
The New York Times, 14 June 2001, A16.
Fox Butterfield

This article presents two conflicting reports on trends in violent crime. The Justice Department's newly released National Crime Victimization Survey (NCVS) found a 15% drop in violent crime last year, the largest single-year drop since the survey was begun in 1973. On the other hand, at the end of May the FBI released its Uniform Crime Report (UCR), which indicated that violent crime held steady last year, after 8 consecutive years of decline (see "U.S. crime figures were stable in 2000 after 8-year drop," The New York Times, 31 May 2001).

Because UCR and NCVS had tracked each other during previous years of decline, the article raises the possibility that one has gone awry this year. It points out, however, that the discrepancy may be partly attributable to differences in measurement methodology. The NCVS is based on interviews with crime victims, while the UCR is compiled from reports from law enforcement groups. The NCVS covers rape, sexual assault, robbery, aggravated assault and simple assault. It does not cover murder, which is covered by the UCR. On the other hand, the UCR does not include simple assault.

Of course, the Justice Department is aware of these differences. On their web site, you can read an explanation for different aspects of crime reflected in the NCVS and UCR, and how the two measures complement each other.

One advantage cited for the NCVS is that it includes crimes that are not reported to law enforcement agencies. On the other hand, since it is based on survey data, it is subject to sampling error.

In an article entitled "Good News! More People are Reporting Crimes," (Washington Post, 15 July 2001, B3), Iain Murray of the Statistical Assessment Service (STATS) argued that this year's discrepancy between the NCVS and UCR is indicative of a healthy trend. He hypothesizes that crime has now declined to a level that it is no longer accepted as the norm, and people are now reporting crimes that might have gone unreported when crime was more pervasive. He cites figures showing that minority communities especially had significant increases in reporting crime last year. You can find the text of Murray's article on the STATS web site.

DISCUSSION QUESTIONS:

(1) In describing the NCVS, the Times explains that "simple assaults accounted for 61.5 percent of all violent crime in the new survey, and because they declined by 14.4 percent in 2000 compared with 1999, they accounted for most of the large drop in violent crime." But the overall drop was 15 percent. Does this make sense?

(2) The Department of Justice web site states that "trend data in NCVS reports are described as genuine only if there is at least a 90% certainty that the measured changes are not the result of sampling variation." What does this mean?

(3) What do you think of Murray's explanation for the discrepancy? Can you think of a way to test this theory?
<<<========<<

>>>>>=============>
The following series of articles are all related to the recent controversy over ways to control the lengths of baseball games.

Call more strikes, umpires are told.
New York Times, 14 July, 2001
Murray Chass

Looking at pitch counts is unfair, umpires say.
New York Times, 15 July, 2001
Murry Chass

Now even umpires are arguing over number of balls and strikes.
New York Times, 16 July, 2001
Murray Chass

Players might join umpires in dispute over pitch counts.
New York Times, 17 July, 2001
Murray Chass

The poisson threatens the umpires.
New York Times, 18, July 2001
Dave Anderson

New York Times, 18, July 2001, A23, Editorial desk
Jim Bouton

Baseball retreats in dispute over umpires' pitch counts.
New York Times, 19, July 2001
Murry Chass

Sandy Andersons, baseball's chief of operations, decided to try to shorten the length of baseball games by telling the umpires to "hunt for strikes" to decrease the number of pitches in game for which they were the plate umpire.

The umpires viewed this as an attempt to get them to manipulate the strike zone. They had already been instructed early in the season to adhere more strictly to the traditional strike zone. The size of the strike zone had shrunk unofficially over the years. The officials wanted the umpires to return to the book and to especially call higher strikes but also on pitches at the bottom limit of the zone and on its inside corner. The strike zone has a long and interesting history. Columnist George Will described this history in his April 1 column.

Anderson said that a two-month study found a "correlation between very high pitch counts and a misapplication of the strike zone." He said "we've averaged in the 280's and if people called strikes that are there we'd average in the 270's. I've told those averaging around 310, that's unacceptable and it's evidence of a very small strike zone and we have videotape to support it." You can see some data and graphics relating to pitch counts and strikes here.

The umpires clearly did not like the idea of having the baseball commissioner telling them how to call balls and strikes. As legendary umpire Bill Klem said when asked if a pitch was a ball or a strike "It ain't nothing until I call it".

In his op-ed article, Jim Boutin (pitcher for the New York Yankees 1962-1968) writes that every umpire develops his own version of the strike-zone based on a number of specific factors like a player's height and posture. He remarks: "What matters to the players--both pitchers and batters-- is consistency. If they know an umpire's tendencies, they can adjust."

The umpire's union filed a grievance which led to the commissioner's office declaring that it would not use pitch counts and averages to judge an umpire's performance. The union then withdrew its grievance and the episode ended.

DISCUSSION QUESTIONS:

(1) In the July 18 article we read

Anderson rejected the notion that he wanted umpires to call pitches
other than what they are. He pointed out that baseball is spending more
than \$1 million to install cameras at various parks to "precisely identify
strikes and balls." Baseball would hardly "undermine that by telling umpires
to call strikes where they don't exist," he said.

How do you think the umpires feel about that?

(2) What do you think the role of statistics should be in trying to settle an argument like this?
<<<========<<

>>>>>=============>
Putting Chinese to the test: learning language may help boost SATs.
Boston Globe, 7 July, 2001

Anand Vaishnav

This article discusses a recent study by Chieh Li at Northeastern University and Ronal L. Nuttall at Boston College aimed at showing that experience learning to write Chinese helps students on the SAT-Math exams. The study appeared in the Mathematics Education Research Journal, Vol.13, No.1, pp.15-27 published by the Mathematics Education Research Group of Australasia

The authors observe that students learning English are exposed to linear motions--they follow a line for the smaller letters etc. Students learning Chinese work with a series of boxes and the characters have to be put in the boxes and the strokes have to bear certain relations to the horizontal and vertical sides of the box and to other strokes.

A previous study by these authors showed that knowledge of written Chinese is related to success on spatial problems as measured by standard tests such as the Piagetian Water Level Task. This task requires drawing a line to represent what the water level looks like when a half-filled glass of water on a table is tilted at a series of angles. Success is judged by how close the lines are to horizontal.

On the basis of this and other such studies, the authors conjectured that experience in learning to write Chinese would also be helpful in questions on the mathematics SAT test that requires spatial reasoning. They then carried out a study to test this hypothesis.

In order to have subjects and controls as similar as possible, the author chose as subjects Chinese- Americans who were educated in American schools and grew up in the American culture. They advertised for subjects in the Boston area. They obtained a Chinese writing group consisting of 42 undergraduates (19 males, 23 females) and a non-Chinese writing group consisted of 108 undergraduates who were not able to write Chinese characters. Subjects were asked to report their SAT scores and samples were checked to verified the validity of their reports. They were also given two spatial reasoning tests.

Female Chinese writers averaged 703 out of a possible 800 on the math section of the SAT, while females who couldn't write Chinese averaged 629. Male Chinese writers averaged 645 on the math section as compared to 613 for males who could not write Chinese. These results showed a significant improvement on the SAT's for students who had learned the Chinese language.

The authors considered other hypotheses and carried out multiple correlations to test other hypotheses. Consistent with their previous work, they found a positive correlation between the math scores and the student's scores on the spacial reasoning tests.

Note that the gender gap that shows up on SAT scores for American students does not seem to appear for Chinese students. The authors remark that recent studies have show that, indeed, there is not a gender gap in mathematics testing in China or Japan.

DISCUSSION QUESTION:

What other explanations might be given for the difference in performance of the two groups?
<<<========<<

>>>>>=============>