Biological Data Analysis: Homework 3

Due Tuesday, Sept. 17

You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.

1. As I was typing this assignment, our cat Gus wanted me to pet him, so he patted me on the arm with his left paw 9 times and his right paw 2 times. Is this significantly different from a 1:1 ratio? Analyze the data using all three of the goodness-of-fit tests we've learned, report the P-values, and write a sentence interpreting any differences among the results.

exact binomial: P=0.065
chi-square: P=0.035
G-test: P=0.028
The chi-square and G tests give significant (P<0.05) P-values, which are too low.

2. Falk and Ayala (1971) collected data on 1187 individuals, recording whether each one clasped their hands with the right thumb on top (R) or the left thumb on top (L). There were 535 R individuals and 652 L individuals. Is this significantly different from a 1:1 ratio of R and L individuals? Analyze the data using all three goodness-of-fit tests, report the P-values, and write a sentence interpreting any differences among the results.

exact binomial: P=0.00075
chi-square: P=0.00068
G-test: P=0.00068
The chi-square and G tests give that are only slightly too low, showing that with large sample sizes, all three tests give similar results.

3. Under certain conditions, animal cell lines can become "immortalized," meaning they will keep growing and dividing indefinitely in laboratory cultures. Nowak et al. (2004) looked at the effect of the pro-apoptotic protein Bax on immortalization of mouse muscle cells. They made mice without the Bax protein (Bax−/−), established cell lines from them, and compared them to cell lines from mice with the Bax protein (Bax+/− and Bax+/+). After 50 days, all 7 lines of Bax−/− cells were still growing, while only 3 out of 9 of the lines with Bax were growing. Test the data using all three tests of independence, and compare the results of the three tests.

Fisher's exact test: P=0.011. Chi-square test: P=0.0063. G-test: P=0.0018. The different P-values illustrate that with small sample sizes, the three tests give different results; in this case, the chi-square and G-tests give P-values that are too low.

The figure shows how to enter the numbers in the spreadsheet for the chi-square or G-test. The same numbers are used for the Fisher's exact test. Some people used 3 alive and 9 dead for the Bax+ numbers, but the total was 9 for Bax+, so when the question says 3 were alive, it means that 6 were dead.

4. McDonald (1989) collected amphipods (Platorchestia platensis) on a beach on Long Island, New York, and determined their genotype at the mannose-6-phosphate isomerase (Mpi) locus. Totalled across several dates, there were 1002 Mpi100/100, 1715 Mpi100/90, and 761 Mpi90/90 females; there were 676 Mpi100/100, 1204 Mpi100/90, and 442 Mpi90/90 males. Is the difference in genotype proportions between females and males significant? Test the data using the chi-squared and G-tests of independence, and compare the results of the two tests. Optional: If you want some extra fun, use SAS to analyze the data using Fisher's exact test, as well. I'll talk about SAS in class in a couple of weeks, but if you want a challenge, you can read through the handbook page on SAS and try to figure it out yourself.

Chi-square: P=0.026. G-test: P=0.026. This illustrates that the chi-square and G-tests, which gave different results with the small numbers in the first question, give about the same result with a large sample size.

Only a couple of people tried the Fisher's exact test in SAS; here's how to set it up:

data amphipods;
   input sex $ genotype $ count;
female 100/100 1002
female 100/90 1715
female 90/90 761
male 100/100 676
male 100/90 1204
male 90/90 442
proc freq data=amphipods;
   weight count / zeros; 
   tables sex*genotype / chisq;
   exact chisq;

The result is P=0.026, illustrating that with large sample sizes, the chi-square and G-test are accurate.

5. Plot the data from question 4 on a graph. You must create this graph using a computer; do not draw it by hand.

  1. Look at the web page on drawing graphs with Excel.
  2. Proportions used to summarize nominal variables are more informative than plotting the raw numbers. Plotting 1715 females and 1204 males with the Mpi100/90 genotype isn't helpful; what's interesting is that 49.3 percent of females and 51.9 percent of males had that genotype.
  3. Always label both the X and Y axis.
  4. Where possible, it's better to use X-axis labels, rather than a legend, to identify the different bars.
  5. Put things you want to compare next to each other; in this case, put the male percentage next to the female percentage for each genotype.
  6. If you're printing in black-and-white, don't use colors for the bars that will print as the same shade of gray.

6. Biological statistics is an important part of your life, of course, but it shouldn't be the only part of your life. Do something fun and adventurous this weekend, so fun and so adventurous that you'll remember it 10 years from now. If you have the kind of fun adventure you can tell me about, then tell me about it; if you have the kind of fun you'd like to keep private, then don't tell me about it.


Falk, C.T., and F.J. Ayala. 1971. Genetic aspects of arm folding and hand-clasping. Japanese journal of human genetics 15: 241-247.

McDonald, J.H. 1989. Selection component analysis of the Mpi locus in the amphipod Platorchestia platensis. Heredity 62: 243-249.

Nowak, J.A., J. Malowitz, M. Girgenrath, C.A. Kostek, A.J. Kravetz, J.A. Dominov, and J.B. Miller. 2004. Immortalization of mouse myogenic cells can occur without loss of p16INK4a, p19ARF, or p53 and is accelerated by inactivation of Bax. BMC Cell Biology 5:1.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 12, 2013. Its URL is