Biological Data Analysis: Homework 3

Due Tuesday, Sept. 22

You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.

1. As I was typing this assignment, our cat Gus wanted me to pet him, so he patted me on the arm with his left paw 9 times and his right paw 2 times. Is this significantly different from a 1:1 ratio? Analyze the data using all three of the goodness-of-fit tests we've learned, report the P-values, and write a sentence interpreting any differences among the results.

exact binomial: P=0.065
chi-square test: chi-square=4.455, 1 d.f., P=0.035
G-test: G=4.818, 1 d.f., P=0.028
The chi-square and G tests give significant (P<0.05) P-values; the correct P-value, given by the exact test, is not quite significant (P>0.05). This shows that with small sample sizes, the chi-square and G tests can give misleading results.

If you entered "y" for the Yates continuity correction, which I mentioned very briefly in class, the P-values for the chi-square and G-test are a lot closer to 0.065. However, they're still not exactly the same as the correct value from the exact test, so the basic point, that the the chi-square and G-test give somewhat inaccurate results with small sample sizes, is still true.

hand clasping
Left thumb on top; right thumb on top.

2. Some people clasp their hands with the right thumb on top; others clasp with the left thumb on top. Doing it the opposite to what you normally do feels very weird to most people. Falk and Ayala (1971) collected data on 1187 individuals, recording whether each one clasped their hands with the right thumb on top (R) or the left thumb on top (L). There were 535 R individuals and 652 L individuals. Is this significantly different from a 1:1 ratio of R and L individuals? Analyze the data using all three goodness-of-fit tests, report the P-values, and write a sentence interpreting any differences among the results.

exact binomial: P=0.00075
chi-square test: chi-square=11.532, 1 d.f., P=0.000684
G-test: G=11.551, 1 d.f., P=0.000677
The chi-square and G tests give that are only slightly too low, showing that with large sample sizes, all three tests give similar results.

If you read this question and didn't immediately clasp your own hands to see which thumb was on top, you lack the sense of curiousity about the natural world necessary to be a biologist, and you should change your major right now. And in case you're wondering, the way you clasp your hands is not a simple genetic trait.

3. McDonald (1989) collected amphipods (Platorchestia platensis) on a beach on Long Island, New York, and determined their genotype at the mannose-6-phosphate isomerase (Mpi) locus. Totalled across several dates, there were 1678 Mpi100/100, 2919 Mpi100/90, and 1203 Mpi90/90 individuals. Using the Hardy-Weinberg formula, the expected number of each genotype under the biological null hypothesis of no natural selection and no population mixing is 1697.2 Mpi100/100, 2880.6 Mpi100/90, and 1222.2 Mpi90/90. Is the difference in genotype proportions between the observed data and the expected under Hardy-Weinberg equilibrium significant? Test the data using the chi-squared and G-tests of goodness of fit.

chi-square test: chi-square=1.031, 1 d.f., P=0.31
G test: G=1.031, 1 d.f., P=0.31
P is greater than 0.05 for either test, so the difference between observed and expected is not significant. As described in the textbook (but not in class), Hardy-Weinberg proportions are an "intrinsic hypothesis," meaning that the expected genotype numbers are based on the allele frequencies in the data, not something known before doing the experiment. This is why there is just one degree of freedom, not two. Don't worry if you overlooked this; unless you're doing Hardy-Weinberg, intrinsic hypotheses are rare in biology.

4. Plot the data from question 3 on a graph. You must create this graph using a computer; do not draw it by hand.

One thing I was looking for in this graph is that you show both the observed data and the expected, since the similarity of observed to expected is the main result. A bar graph is best for this, with bars for the observed numbers and either bars of a different pattern, or horizontal lines, for the expected values. Some of you used bars of different colors for observed and expected, but when you printed them in black-and-white, they looked identical. This is why I recommend using different patterns, not different colors.

You also needed to label the Y-axis, so people know whether it is proportions, percentages, or numbers of individuals.

5. Those pictures I've been showing before class this week are from a trip I took 9 years ago, and I can remember each day of the trip in vivid detail. But I couldn't tell you much about what I was doing two weeks ago. Biological statistics is a very important part of your life, of course, but it shouldn't be the only thing you do. Do something fun and adventurous and exciting and interesting this weekend, so fun and so adventurous that you'll remember it 9 years from now. If you have the kind of fun adventure you can tell me about, then tell me about it; if you have the kind of fun you'd like to keep private, then don't tell me about it.

For those of you who had interesting adventures, thank you for telling me about them. For those of you who had really interesting adventures, thank you for not telling me about them, so I won't be considered a co-conspirator when the police catch you.


Falk, C.T., and F.J. Ayala. 1971. Genetic aspects of arm folding and hand-clasping. Japanese journal of human genetics 15: 241-247.

McDonald, J.H. 1989. Selection component analysis of the Mpi locus in the amphipod Platorchestia platensis. Heredity 62: 243-249.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 30, 2015. Its URL is