John Finn Shunhui Zhu

1S Bradley 316 Bradley

**Class 4 Average and Standard Deviaton**

**Class 5 Standard Deviaton; Stephen Jay Gould on batting .400**

**Class 6 Introduction to Probability**

**Class 7 The Binomial Distribution**

**Class 9 Polling, Standard Error, Normal Distribution**

**Class 10 Standard Error, Normal Approximation; Projects**

**Class 11 Surveys and Data Collection**

**Class 12 Correlation and Regression**

**Class 13 Correlation & Regression, and Immigration Statistics**

**Class 14 Correlation & Regression; Conditional Probability**

**Class 15 Economics, Streaks and Tversky**

Welcome to

Topics that might be covered in

- Health risks of electric and magnetic fields;

- Statistics, expert witnesses, and the courts;

- The use of DNA fingerprinting in the courts;

- Randomized clinical trials in assessing risk;

- The role of statistics in the study of the AIDS epidemic;

- Paradoxes in probability and statistics;

- Fallacies in human statistical reasoning;

- The stock market and the random walk hypothesis;

- Demographic variations in recommended medical treatments;

- Informed patient decision making;

- Coincidences;

- The reliability of political polls;

- Card shuffling, lotteries, and other gambling issues;

- Scoring streaks and records in sports.

The class will differ from traditional math classes in organization as well as in content: The class meetings will emphasize group discussions, rather than the more traditional lecture format. Students will keep journals to record their thoughts and questions. Additional homework will be assigned regularly. There will be a major final project in place of a final exam.

The class meets Tuesday and Thursday from 10:00 to 11:50 a.m. in Filene Auditorium, on the first floor of Bradley/Gerry Halls.

Discussions are central to the course and usually focus on a current article in the news. They provide a context in which to explore questions in more depth and understand material better by explaining it to others.

Every member of each group is expected to take part in these discussions and to make sure that everyone is involved: that everyone is being heard, everyone is listening, that the discussion is not dominated by one person, that everyone understands what is going on, and that the group sticks to the subject.

The text for the course is Statistics, second edition by Freedman, Pisani, Purves and Adhikari (FPPA), available from the Dartmouth Bookstore and Wheelock Books. Students will also learn to use the JMP statistical package that is available from the public server as a key served application.

Each participant should keep a journal for the course. This journal will include:

- Specific assignments that you have been asked to do for your journal. These will
include questions they you are asked to think and write about, related to the
current day's discussion, the
results of computer investigations, etc.

- General comments about the class; things you don't understand; things you finally
do understand. You might describe an
experience of trying to explain material from class to a friend or

family member.

- Finding and commenting on news articles about topics relevant
to the course; asking us challenging questions; making
connections between what went on in class and experiences in
your own life; going to a casino and winning a lot of money.

- Anything interesting and imaginative about a chance topic.

We encourage you to cooperate with each other in working on anything in the course, but what you put in your journal should be your own alone. If it is something that has emerged from work with other people, write down who you have worked with. Ideas that come from other people should be given proper attribution. If you have referred to sources other than the texts for the course, cite them.

Journals will be collected and read on these dates:

Thursday 10 October

Thursday 24 October

Thursday 7 November

Thursday 21 November

Tuesday 3 December

**Homework **

To supplement the discussion in class and assignments to be written about in your
journals, we will assign readings from your text FPPA, together with accompanying
homework. When you write the solutions to these homework problems, you should keep
them separate from your journals. Homework assignments will be assigned once a week and
should be handed in on Thursdays.

**Final project**

We will not have a final exam for the course, but in its place, you will undertake
a major project. This project may be a paper investigating more deeply some topic
we touch on lightly in class. Alternatively, you could design and carry out your
own study. Or you might choose to do a computer-based project. To give you some ideas,
a list of possible projects will be circulated. You can also look at some previous
projects on the **Chance**
Database. However, you are also encouraged to come up with your own ideas for projects.

**Chance Fair**

At the end of the course we will hold a Chance Fair, where you will have a chance
to present your project to the class as a whole, and to demonstrate your mastery
of applied probability by playing various games of chance. The Fair will be held
during the final examination time assigned by the registrar.

**Resources**

Materials related to the course will be kept on our web site
and on Kiewit PUBLIC server (PUBLIC: Courses & Support: Academic Departments & Courses:
Math: Chance). In addition supplementary readings will be kept on reserve in Baker Library.

- 1. Read the article handed out in class: "Hit the Lotto, Buy a Toaster."

- 2. Form two pairs within your base group.

- Pair 1: Prepare a case for the old (high stakes) form of advertising.

- Pair 2: Prepare a case of the new (low stakes) form of advertising.

- Pair 1: Prepare a case for the old (high stakes) form of advertising.
- 3. Regroup in pairs within your base group.

- Listen to each other's arguments.

- Come up with a recommendation for the Mayor.

- Listen to each other's arguments.

2. Read Chapters 1 and 2 from FPPA. Do the review exercise at the end of Chapter 2 on page22. (Due Thursday)

Read the article "Study Finds Stunted Lungs in Young Smokers".

On **Table 1** from "Effects of Cigarette Smoking on Lung Function in Adolescent Boys and Girls" in The *New England Journal of Medicine*, 26 Spetember 1996.

Study the table.

What conclusions would you draw from the table?

Is there any relation between maternal smoking and child smoking?

Do you think there are significant differences between boys and girls with respect to smoking?

What confounding factors might explain these differences between boys and girls?

David A. Kessler, M.D., the controversial commisioner of the Food and Drug Aflmin- istration (FDA) will give a lecture Thursday 10 October at 7:30 p.m. in Cook Auditorium in Murdough Center. Kessler will examine health issues and FDA's evolving policy concerning tobacco.

1. Read the article handed out in class: "College Board Revises Test to Improve Chances for Girls"

2. Discuss the following questions:

a. What kinds of bias are discussed in the article?

b. Are they measurement biases?

Look at both Math5 Survey and data, MALS Survey and data.

**Homework:
**

Chapter 3 , all review exercises.

Chapter 4 , Review Problems: 1-3, 5, 7-10, 13, 14.

Read Chapter 6.

**Class discussion:** PSAT modifications.

Read "College Board Revises Test to Improve Chances for Girls", by Karen W. Arenson; *The New York Times*. Wednesday 2 October 1996

Discussion questions:

- Do you think the PSAT is biased?
- What does bias mean in this case?
- Does the PSAT measure
- - intelligence?
- - ability
- - merit?

**Introducing Average and Standard Deviation**

**More on using JMP**

**Journal assignment**

- Learn how to use JMP, and use it to explore the Chance Class Survey data.
- Read and comment on Stephen J. Gould's "The Median Isn't the Message", Discover magazine, June 1985.

Read "Incarceration Is a Bargain", by Steve Hanke, in *The Wall Street Journal,* Monday, 23 September 1996.

Discussion questions:

- How do you think Mr. Leavitt came to the figure that the average criminal would cause $53,900?

- Given that most of the people in prison are there for non-violent, non-property crimes (e.g., in 1994 the proportion of federal prisoners who were drug offenders was 62 percent)-sometimes referred to as consensual crimes-does that effect your reading of the article?

- How do you think the author comes to the conclusion that "violent crime would be approximately 70% higher today if our prison population had not increased since 1973; and property crime would be almost 50% more frequent" ?

- What other factors might explain the decrease in violent and property crimes reported in the article?

**Standard Deviation**

**Discussion of Stephen Jay Goild's 'Why the Death of** 0.400 **Hitting Records Improvement of Plays'** (from his *Full House*, Harmony Books, 1996).

**Journal assignment**

- Comment on the
*Star Iribune*'s review of Gould's*Full House*.

Don't forget your **two article summaries**, which are due Tuesday. When you summarize an article, be sure to include the **publication** it comes from and the **date**.

**Homework:** Read chapters 13 and 14 ("What Are the Chances" and "More about Chance" in FPPA. Do the even Review Exercises in chapter 13, and odd Review Exercises in chapter 14.

- What is the birthday problem?

- What would you be willing to bet that there is a birthday match in this class?

- Determine if there are any matches.

- What is the probability that there is
*not*a match in a group of four?

- What is the probability that there
*is*a match?

- What is the probability of a match in general?

- How many people are needed in order to have at least a 50% probability of a match?

- What is the probability that someone in the room has
*your*birthday?

**Coincidences in Airplane Crashes**

- Read the two letters from
*The New York Times*about meteorites and airplane crashes.

- Which of the two writers do you think has the right approach?

- What is the relationship between these articles and the birthday problem?

**Coin Classing Experiment**

- What is the probability of your getting a head five times in a row?

- What is the probability of someone in the class getting a head five times in a row?

**Journal assignment** Think of coincidences in your own life. What is the likelihood of these being random chance occurrences? Are they really events that have occurred *against all odds*?

a.Break into groups of four.

b.Identify a member of your group who claims to be able to tell the difference between Pepsi and Coke. (Coke Classic, that is; accept no substitutes!)

c.Design an experiment to test whether this is true. Remember thatone swallow doth not a summer make:Don't certify your taste-tester just on the basis of one taste. Write down exactly what data you will collect and what you will do with the data before you start collecting it.

d.What is being tested?

- When you design an experiment like this you should ask several questions. First, what do you want to test? Do you want to test if a person can tell given a single cup whether it contains Coke or Pepsi? Can a person decide which of two cups is Coke and which is Pepsi? Can a person who is given two cups simply decide if they have the same or different drinks? These are all testing slightly different abilities.

- What does your experiment test? Is that what you want it to test?
e.Carry out the experiment.

f.Record your results.

**The Binomial Distribution**

- What are the chances of getting certain results, in the Coke vs. Pepsi test, by chance alone?
- What is a binomial distribution?

Don't forget your **two article summaries**, which are due Tuesday. When you summarize an article, be sure to include the **publication** it comes from and the **date**.

**Homework assignment.** In FPPA:

- Chapter 15, odd review problems.

- Chapter 16, all review problems.

- Chapter 17, even review problems.

1. The CNN Tracking Poll for October 19-20 interviewed 732 likely voters. They reported that 55% favored Clinton, 34% favored Dole and 6% favored Perot with a sampling error of + or - 4% (sampling error is also called margin of error).

- What do you think sampling error means?

- Discuss how this fits into your group's understanding of "sampling error". Where do you think the " 19 cases out of 20" comes from?

- Can the difference ( 9 pts vs. 25 pts) be explained by chance?

- What are some other possible explanations?

- Do you think that tracking polls are a good idea?

(1) Read the NYT article "Misreading the Gender Gap" by Carol Tavris (September 17,1996), What do you think of her explanation of the gender gap in the current election.(2) How would you explain "margin of error" to a friend who had not had a statistics course?

**Speaker:** Tami Buhr from Harvard University will speak on her experiences in polling.

**Standard Error, Normal Distribution.**

**Homework Assignment:** Read Chapters 19, 20, 21.

- Chapter l9, Review Problems 1, 3-6, 10
- Chapter 20, Review Problems 2, 3, 4, 6, 7, 9, 11,
- Chapter 21, Review Problem 1, 10.

(We will be discussing a couple of the ideas from Chapter 18 in class. If you miss the class or need additional support, you may want to look through that chapter.)

**Preliminary Project Proposal:**

Please hand in a separate sheet with a brief description of your project proposal next Thursday. We will talk more about this is class Wednesday.

**Journal assignment**

Comments and reflections on speaker's talk.

**Standard Error and Normal Approximation**

**by John Finn**

*And* some review of the mathematics that has come up so far. We're going to try to make firm the mathematics behind chance quantities, particularly *sums* of draws from a box.

**About Your Chance Project**

Remember that you're to hand in Thursday a a brief description of your project proposal. We'll talk about ideas for projects and our policies on them.

**Guest Speaker: Nancy Mathiowetz** of the University of Maryland will speak on Surveys and Data Collection.

**Confidence Intervals and Standard Deviation**

**Homework assignment:**

- Chapter 23, Review Problems 1, 3, 6, 12
- Chapter 8, Review Problems 1,2,6,7,10
- Chapter 9, Review Problems 6,7,10,13.

(If you need a review of how to plot lines, find slopes, etc., read Chapter 7)

**Journal assignment**

Don't forget your **two article summaries**, which are due Tuesday. When you summarize an article, be sure to include the **publication** it comes from and the **date**.

We read about a poll taken to estimate what percentage of the population are voting for each of several candidates. The results say that "57% are for Millard Fillmore, with a 3% margin of error". What does this mean, and how do the pollsters come up with it?

**Class discussion:** You be the judge: did regression analysis reveal a voting fraud, and was the fraud decisive?

Read "Probabilty Experts May Decide Pennsylvania Vote" (* The Nets York Times*, 11 April 1994).

Discussion questions:

- What confounding factors might account for the anomalous election results?
- Was Professor Ashenfelter's method a reasonable way to decide the issue?
- How is a court to determine whether a proven voting fraud was decisive?
- How certain must a court be to take the extreme step of seating the nominal loser?
- How did Professor Ashenfelter arrive at 6(3ifo in his calculation of the probability that the anomaly could have occurred by chance?

**Scatter Diagrams, and Correlation and Regression**

Quantifying the degree of association between two variables:

- the scatter diagram;
- the correlation coefficient;
- the SD line;

**Guest Speaker: Prof. Richard Wright: The Satistics of Immigration
**
Prof. Wright is the Chair of Dartmouth's geography department.

**Human Subjects**

1. Elizabeth Bankert, Assistant Director of Grants & Contracts here at Dartmouth, will talk about guidelines for carrying out projects that involve human subjects (which include any sort of survey), and Dartmouth's regulations on these matters.

**Correlation and Regression.**

What are the SD line and the regression line of a scatter diagram? How do we determine them, and what do they tell us about the data?

**Homework Assignment**

- Chapter 10; Review Exercises 1,3,4,8.
- Chapter 12; Review Exercises 2,8.

(If you have any questions about the Root Mean Square material, read Chapter 11).

**Correlation and Regression.
**

What are the SD line and the regression line of a scatter diagram? How do we determine them, and what do they tell us about the data?

**Class discussion:** Conditional probability and false positives.

**1.** In one of Marilyn vos Savant's columns in *Parade* magazine a reader asked

Suppose we assume that 5So of the people are drug-users. A test is 95So accurate, which we'll say means that if a person is a user, the result is positive 95So of the time; and if she or he isn't, it's negative 95Wo of the time. A randomly chosen person tests positive. Is the individual highly likely to be a drug-user?

Marilyn's answer was:

Given your conditions, once the person has tested positive, you may as well flip a coin to determine whether she or he is a drug-user. The chances are only 50-50.

How can Marilyn's answer be correct?

**2.** An article in *The New York Times* some time ago reported that college students are beginning to routinely ask to be tested for the AIDS virus.

The standard test for the HIV virus is the Elisa test that tests for the presence of HIV antibodies. It is estimated that this test has a 99.8% sensitivity and a 99.8% specificity. 99.8Wo specificity means that, in a large scale screening test, for every 1000 people tested who do not have the virus we can expect 998 people to have a negative test and 2 to have a false positive test. 99.8So sensitivity means that for every 1000 people tested who have the virus we can expect 998 to test positive and 2 to have a false negative test.

The Times article remarks that it is estimated that about 2 in every 1000 college students have the HIV virus. Assume that a large group of randomly chosen college students, say 100,000, are tested by the Elisa test. If a student tests positive, what is the chance this student has the HIV virus? What would this probability be for a population at high risk where 5Wo of the population have the HIV virus?

If a person tests positive on an Elisa test, then another Elisa test is carried out. If it is positive then one more confirmatory test, called the Western blot test, is carried out. If this is positive the person is assumed to have the HIV virus. In calculating the probability that a person who tests positive on the set of three tests has the disease, is it reasonable to assume that these three tests are independent chance experiments?

**Journal assignment**

Read and comment on the Manchester, NH *Union Leader *story "Exit Poll Wrong Call in Senate Race Leaves Anger, Hurt, Red Faces". There are a couple of discussion questions at the end of the article.

**Guest speaker:** Professor Michael Knetter of Dartmouth's Economics Department will speak on the role of statistics in economics.

**Activity: recognizing streaks**

We will demonstrate a computer simulation of three coins:

- a
**streaky**coin, which is more likely to come up Heads after tossing a Head, and Tails after tossing a Tail. For instance, we might rig it so that*P*(*H*I*H*on previous toss) =*P*(*T*I*T*on previous toss) = 3/4 (from which you can deduce the other two conditional probabilites);- an
**ordinary**coin, where each toss is independent of all previous tosses; - a
**vacillating**coin, which is more likely to come up*H*after*T*, and vice versa.

The streaky coin is more likely to produce

*streaks*of H's and of T's (like*HHHHHHTTTT*; a streak of 4 H's followed by a streak of 4 T's) than the ordinary coin, which is in turn more likely to produce streaks than the vacillating coin.We'll tell you the probabilities for each of the coins. Your mission, should you decide to accept it, is to repeatedly look at a sequence of 20 tosses, and guess which coin is producing it. For instance, if we get

*HHHHHHHHHHTTTTTTTTHH*, you'd probably guess the streaky coin.**Class Discussion**Read the*New York Times*article "'Hot Hands' Phenomenon: a Myth?", on Stanford psychologist Amos Tversky's study of treaks in basketball.Discussion questions:

- Do you personally believe that streaks in sports are real?
- What are some of the questions that Tversky says are, in a way, beside the point?

**Homework Assignment:**- Chapter 26, Review exercises 2, 5.
- Chapter 28, Review exercises 2, 3.
- Chapter 29, Review exercises 1, 2, 4.

- an