You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.
1. Get five coins and put them in a container with a lid, so that you can flip all five coins at once by shaking the container and then letting the coins fall to the bottom. If you flip the five coins, and the probability of one coin being "tails" is 0.50, what is the probability of getting "tails" on all five coins?
2. Now you are going to do an experiment to see whether thinking about tails makes tails more likely to appear. You will think about tails, shake your coin container, and see how many tails you get. What is the biological null hypothesis? The biological alternative hypothesis? The statistical null hypothesis? The statistical alternative hypothesis?
Answer: Biological null: I cannot control coin flips with my mind.
Biological alternative: I can control coin flips with my mind.
Statistical null: Half or fewer of the coin flips will be tails. (We almost always do two-tailed tests; the null hypothesis for a two-tailed test would be that half of the flips were tails. However, because we would only consider an excess of tails to be evidence of mind control, we do a one-tailed test and consider an excess of heads to fit the null hypothesis.)
Statistical null: More than half of the flips will be tails.
3. Now do the experiment. Think about tails, flip your five coins, and record how many tails (0 to 5) you got.
Answer: When I did it, I got 2 tails. Out of the 66 people in the class, 10 said they got 5 tails on the first flip. The null hypothesis is that 0.03125×66=2.1 people would be that lucky. Applying the exact test of goodness-of-fit (10 first-flippers, 56 non-first-flippers, expected proportions 0.03125 and 0.96875), the P-value is 0.00004. So either some of your really can control coin flipping with your minds, or some of you lied about your results.
4. If you got five tails in question 3, skip question 4 and just answer question 5. If you didn't get five tails, maybe you just weren't doing the experiment right. So work your way down this list, record the number of tails for each experiment, and keep going until you get five tails, then stop and go to question 5. Turn in a table of your results, with rows labelled "a" through "o" and one column for each condition (if you have to shake longer, etc.). Also turn in your drawings, if you get that far.
Answer: Here are the results I got:
a 1 4
b 2 1
c 1 2
d 2 3
e 3 3
f 4 2
g 3 2
h 2 3
i 1 3
j 1 2
k 2 3
l 2 1
m 2 4
n 3 5
Out of 66 students, there should have been 16.4 (24.9%) who had 5 tails before getting to condition "L", where you had to draw a coin. Instead, there were 44 students. The P-value from the exact test of goodness-of-fit is 1.6×10−13. So it looks like over one-third of the class either can control coins with your minds, or lied about their data.
5. Based on the results of just your last experiment (the one that gave you five tails), what would a stupid or evil biologist conclude?
Answer: I would ignore the first 30 times I flipped the coins and conclude that there is statistically significant evidence that if I look at a computer image of a beaver's tail and shake the coins really hard, I can control coin flips with my mind.
6. You're not stupid or evil (I hope), so what do you conclude?
Answer: Knowing that it took 31 sets of 5 flips to get one that was all tails, I would conclude that the data fit the null hypothesis quite well, and the 5 tails while looking at a picture of a beaver's tail and shaking the coins really hard was just due to chance. Even if I'd gotten 5 tails on the first flip, I would have known that it's very unlikely that I can really control coins with my mind, so I would have demanded a P-value much smaller than 0.05 to reject the null hypothesis.
7. Download the balance data set, which has the data everyone in the class collected for homework 1. Pick two of the nominal variables that have two values (sex, arm on top, thumb on top, shoes or barefoot, hates math or other). Write down the statistical null hypothesis in terms of the two variables. Then analyze the data using each of the three tests of independence that you've learned. Give your raw data (the numbers of each of the four combinations of your two variables) and the P-values for the three tests in a nice little table (do NOT just copy over the spreadsheets).
Answer: Using sex and arm folding as an example, the statistical null hypothesis is "The proportion of males who fold with their right arm on top is equal to the proportion of females who fold with their right arm on top." You could also put it the other way around: "The proportion of right-on-top folders who are female is equal to the proportion of left-on-top folders who are female." The null hypothesis is NOT that the proportion is 50%. If you were just looking at arm folding, using a test of goodness-of-fit to test the null hypothesis that the proportion of right-on-top was 50% would be interesting. But when you're using a test of independence to analyze two nominal variables, you're testing the null hypothesis that the proportion for one variable is the same for the different values of the other variable.
Here's the results for all of the combinations:
F M chi G Fisher's arm L 138 130 0.320 0.320 0.342 arm R 126 141 F M thumb L 139 158 0.189 0.188 0.193 thumb R 125 113 F M NoShoes 124 116 0.333 0.333 0.340 Shoes 140 155 F M NotMath 187 207 0.145 0.145 0.169 Math 77 64 arm L arm R thumb L 151 146 0.699 0.699 0.728 thumb R 117 121 arm L arm R NoShoes 112 128 0.153 0.153 0.165 Shoes 156 139 arm L arm R NotMath 197 197 0.942 0.942 1.000 Math 71 70 thumb L thumb R NoShoes 138 102 0.404 0.404 0.432 Shoes 159 136 thumb L thumb R Math 80 61 0.733 0.733 0.768 NotMath 217 177 NoShoes Shoes Math 68 73 0.349 0.349 0.375 NotMath 172 222
8. Do the same three tests of independence on the same two variables as in question 7, only just use the data that you collected yourself on 8 or more individuals. Present the raw data and P-values in another nice little table.
Answer: Here's the results for Abigail Palmieri's data on sex and arm folding:
F M chi G Fisher's arm L 0 2 0.206 0.132 0.464 arm R 3 3
9. Write a few sentences about how similar the three P-values are in question 7, how similar the three P-values are in question 8, whether you got significance in any of the tests, and if so, what you think that means biologically.
Answer: The three P-values in question 7 are not identical, but they are very similar to each other. This illustrates that with large sample sizes, it doesn't really matter whether you use the chi-square, G, or Fisher's exact test of independence, they will all give you about the same result. With the very small sample size in question 8, the P-values are very different; this illustrates why it's important to use the more accurate Fisher's exact test for small sample sizes. For question 7, none of the results were statistically significant at the P<0.05 level. For each pair of variables, you could then say that "There's no statistically significant evidence that variable X is related to variable Y." For example, you could say that ""There's no statistically significant evidence that which arm people fold on top is related to sex." To be strictly correct, you shouldn't say "Which arm people fold on top is not related to sex," just that there's "no statistically significant evidence" that they're related. This is because it's possible that they're related, but the difference in proportions was too small to be detected with this sample size.
Return to the Biological Data Analysis syllabus