## Class 9: Coke vs. Pepsi: Analyzing the Results.

When you design an experiment like this you should ask several questions. First, what do you want to test? Do you want to test if a person can tell given a single cup whether it contains Coke or Pepsi? Can a person decide which of two cups is Coke and which is Pepsi? Can a person given two cups simply decide if they have the same or different drinks? These are all testing slightly different abilities.

You might think of the experiment as trying to settle an argument say between Linda and Laurie. Linda claims she can tell the difference between Pepsi and Coke and Laurie claims she cannot. Now Linda does not claim she can do it every time in a series of tests but rather can get it right more times that just by guessing. There are two kinds of errors we can make in our experiment. The first type, called a type I error, is that Linda could establish her claim when in fact she is just guessing and was lucky. The second kind of error, called type II error, occurs if Linda really does have the ability she claims but just has a bad day and does not get enough correct to establish her claim. Laurie wants to be sure that the chance of a type I error is small and Linda wants to make sure the chance of a type II error is small.

Let's consider first the group 2 experiment. The experimenters gave a single taster 6 cups known to contain either Pepsi or Coke. The taster was not told how many had Coke and how many had Pepsi. In fact 3 cups contained Coke and 3 contained Pepsi.

Now suppose we required that the taster get all 6 correct to establish the claim.

Since the taster was not told how many cups contained Coke and how many contained Pepsi, it is reasonable to assume that if the taster was just guessing what a single cup contained, they would have a 50% chance of being correct. It is harder to say just what having the ability means. One simple solution is to ask the taster what percentage of the time they would expect to get it right. If the answer is 80% then we might say that if the taster's claim is correct, they would have an 80% chance of being correct on a single cup.

Then the probability of a type I error is (1/2)^6 = .015. The probability of a type II error is 1-.8^6 = .74. This shows that it is obviously unfair to the taster to require all correct. Thus we should consider changing the requirement to getting, say 5 or more correct.

********************************************************************
We should consider this kind of analysis for all the experiments. Here is the information we have about the experiments that were carried out in class last time.

Group 1: One taster did three sets of three cups. each set contained one cup of Coke, one of Pepsi, and one of RC Assume the order was always P, C, RC in each case.

actual:

```    P, C, RC
P, C, RC
P, C, RC
```
taster reported:
```   RC, C, P
RC, C, P
P, RC, C
```

*******************************************************************
Group 2: A single taster was given 6 cups total: 3 of Coke, 3 of Pepsi. The taster was not told how many cups there were of each.

actual:
```   P    C    P    P    C    C
```
taster reported
```   P    C    C    P    P    C
```
********************************************************************
Group 3: There were three tasters. Each taster was given 6 cups and not told how many cups contained Coke and how many contained Pepsi. The experimenters deliberately used 4 cups of one drink and 2 of the other. The same cups were used for each taster.

actual:
```   C    C    P    C    C    P
```
taster 1:
```   P    C    P    C    C    P
```
taster 2:
```   P    C    C    C    C    P
```
taster 3:
```   P    C    P    C    C    P
```

********************************************************************
Group 4:

The taster was given 3 sets of 2 cups.

actual:
```   P,C    P,C    C,P
```
taster:
```   C,P    C,P    C,C
```

********************************************************************
Group 5:

This group had two tasters each given 3 sets of 2 cups. The content of each cup was determined by picking one piece of paper out of two folded pieces, one of which said Pepsi and the other said Coke (so one pair of cups could consist of two Cokes, two Pepsis, or one of each). The tasters were told how the contents of cups were picked

actual:
```   P,C    C,P    C,P
```
taster 1:
```   P,C    C,P    C,P
```
taster 2:
```   P,C    C,P    C,P
```

Note: We have decided to have the chance fair where you present
your project the last day of the reading period Tuesday May 14
instead of during the final exam period.

Monday we will have a guest speaker John Paulos author of the
best selling book "A mathematician reads the newspaper".

Linda's Laborious Solutions to the Discussion Questions from Class 6

I'll try to work the discussion problems and see if that helps you do the first
journal question for Class 7 (which is supposed to be the same idea, with
different numbers).

These written solutions are no substitute for coming to precepts or office
hours or discussing the problems among yourselves, and I encourage you to do
these things, too.

Here is how I work out the discussion problems.

• 1) the drug test

Suppose you have a large group of people who take the drug test, say X people.
(You can plug in 100,000 for X if you prefer.) These people can be divided
into 4 categories:

• i) drug users who test positive on the drug test
• ii) drug users who test negative on the drug test
• iii) non-users who test positive on the drug test
• iv) non-users who test negative on the drug test

I'll figure out how many are in each category.

• i) there are about .05*X drug users, and about 95% of them test positive, so
there are about .95*.05*X drug users who test positive
• ii) there are .05*X drug users and 5% of them test negative, so there are
.05*.05*X drug users who test negative
• iii) there are .95*X non-users, 5% of which test positive, so there are
.05*.95*X non-users who test positive
• iv) there are .95*X non-users, 95% of which test negative, so there are .95*.95*X non-users who text negative

Now, if you test positive for drug use, you must be in group i) or iii). There
are a total of .95*.05*X + .05*.95*X = 2*.95*.05*X people in groups i) and iii)
(since there is no overlap, I can just add the numbers). Of these people, only
the people in group i) are actually drug users. There are .95*.05*X people in
group i). So out of the 2*.95*.05*X people who test positive for drug use,
.95*.05*X actually use drugs. If you test positive, the chance that you use
drugs is .95*.05*X / 2*.95*.05*X, or 1/2.

• 2) Now I'll try to do the second discussion question the same way. If I remember right, the problem suggests you look at a large sample of college students, say 100,000. I could just use X instead of 100,000, as above, but I will try using 100,000.

Again, there are 4 categories:

• i) students in this sample who have HIV and test positive
• ii) students in this sample who have HIV and test negative
• iii) students in the sample who don't have HIV and test positive
• iv) students in the sample who don't have HIV and test negative

I'll figure out how many people are in each category

• i) about .002*100,000 students in the sample have HIV, and about 99.8% of them test positive, so there are about .998*.002*100,000 students in this category
• ii) .002*100,000 students in the sample have HIV and .2% test negative, so
there are .002*.002*100,000 students in this category
• iii) .998 * 100, 000 students in the sample don't have HIV, and .2% of them
test positive, so there are .002*.998*100,000 students in this category
• iv) similarly, there are .998*.998*100,000 students in this category

Again, if you test positive for HIV, you must be in groups i) or iii). There are a total of .998 * .002 * 100,000 + .002 * .998 * 100,000 people in groups i) and iii) since the two groups don't overlap. Only those in group i) actually have HIV. There are .998*.002*100,000 people in group i). So if you test positive, your chances of having HIV are .998*.002*100,000/(.998 * .002 *100,000 + .002 * .998 * 100,000) or 1/2 again.

Somoeone asked in class about the probability that a person who test negative is really HIV-free. Since there are .002*.002*100,000 + .998*.998*100,000 people who test negative (groups ii and iv), and .998*.998*100,000 of them are HIV-free (group iv), this probability comes out as (.998*.998*100,000)/(.002*.002*100,000 + .998*.998*100,000) or .99999

Recall that there is a delay from the time a person is infected with the HIV virus and the time it would show up on a test. Let's say this is 3 months. Then this last probability should be interpreted as the probability that a person who tests negative today did not have the HIV virus three months ago. People who have reason to be concerned about recent contacts with the virus are encouraged to be retested at a later time.

Now, suppose you are in a different risk group in which 5% of the people have HIV. Taking a large random sample of this group (say 100,000) and dividing up into the same 4 categories, you have again:

• i) people in this sample who have HIV and test positive
• ii) people in this sample who have HIV and test negative
• iii) people in the sample who don't have HIV and test positive
• iv) people in the sample who don't have HIV and test negative

The numbers work out differently in this case.

• i) .998*.05*100,000
• ii) .002*.05*100,000
• iii) .002*.95*100,000
• iv) .998*.95*100,000

So if you are in groups i) or iii), then the chance you are in group i) is .998*.05*100,000/(.998*.05*100,000 + .002*.95*100,000) = .998*.05/(.998*.05 + .002*.95) = .0499/(.0518) = ~ .98 = 98% (pretty high)
---------------------------

The journal question should be about the same, though you have to figure out some of the numbers to use yourself. The answer should come out somewhere in between 50% and 98%.

Linda

Recall that P(A|B) means the probability of A given that B is true.

The AIDS example in terms of conditional probability amount to the following: You are given

P(+test | HIV positive)

and

P(- test | HIV negative)

(both .998 in our case) and you want to know

P(HIV positive | + test)

In general P(A|B) is not equal to P(B|A)

For example, consider two tosses of a coin and let A be the event that both tosses are heads and B the event that the first toss is a head. Then P(A|B) = .5 and P(B|A) = 1.

To find one of these conditional probabilities from the other you need also to know P(A) and P(B) (actually their ratio is sufficient). In the AIDS example this amounts to knowing the probability the patient is HIV positive before the test is performed.

In law, mixing these two probabilities up is called the "prosecutor's paradox".
A prosecutor will often have a reasonable estimate of

P( the evidence | the accused is innoncent)

and then incorrectly state this probability as

P(the acused is innoncent | the evidence)

because that is what the jury wants to know.

For example, in the Simpson trial the DNA experts give a very small probability for the probability of a DNA match for a person chosen at random say in the Los Angeles area. This is

P(match ! Simpson is innocent)

but this is not the same as

P(Simpson is innocent | match).