Biological Data Analysis: Homework 4

Due Tuesday, Sept. 24

You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.

1. In 1983, I collected data on polymorphism in the gene coding for glucose-6-phosphate isomerase (GPI) in the amphipod crustacean Megalorchestia californiana (the critter shown in the banner at the top of the handbook). In 2009, I collected from six of the same locations, to see whether natural selection by changing climate had caused a change in the allele frequencies. There are two GPI alleles, "fast" and "slow"; the table below shows the number of alleles at each location in each year (it's an autosomal gene, so each individual has two alleles). Analyze the data using the Cochran-Mantel-Haenszel test, draw a graph that summarizes the data, and write a sentence interpreting the results.

                            Number of alleles
Location              Year     fast  slow
----------------      ----     ----  ----
Pt. Townsend, WA      1983        4   182 
                      2009        8   218

Neskowin, OR          1983      237   181
                      2009      376   324

Siuslaw Jetty, OR     1983     1006   780
                      2009      301   233

Winchester Bay, OR    1983      190   172
                      2009      539   457

Empire, OR            1983       62    72
                      2009      295   281

Bastendorf Beach, OR  1983      298   270
                      2009      467   387

The P-value from the Cochran-Mantel-Haenszel test is 0.571. This means there is no significant difference in allele frequency between the 1983 and 2009 samples. Note--don't say something like "This shows that allele frequencies did not change between 1983 and 2009." The allele frequencies may have changed, it just may not have been a big enough change to be statistically significant.

On the graph, you should use proportions as your Y variable, not the raw numbers. Also, because there are just two alleles, "fast" and "slow," you don't need to plot both; showing that the allele frequency of fast is 0.02 implies that the frequency of slow is 0.98.

Gpi allele frequencies
Proportions (with 95% confidence intervals) of the fast allele at the Gpi locus in samples of Megalorchestia californiana. Black bars are 1983 samples, gray bars are 2009.

2. Collect some measurement data. You must have one nominal variable with at least six values (such as six different biology classes) and one measurement variable (such as blood pressure). You must have at least ten observations in each of your six or more categories (such as ten students in each of the six biology classes). Your data set could be a published data set, some data you've collected for your research, or some data you collect for this assignment.

Try to pick groups where the difference in the measurement variable is not so big that it's obvious the statistical test will be highly significant. For example, if you measure the weights of leaves, measure leaves from six or more trees of the same species, not different species with obviously different leaves.

Put your raw data (the individual observations) in a table. For each category, calculate the mean, median, range, standard deviation, standard error of the mean, and 95 percent confidence interval (you may use a spreadsheet, web page or computer program for this). Add these summary numbers to the table.

(Note that we haven't talked about standard deviation, standard error, or confidence intervals yet; skim through the appropriate textbook chapters if you want to understand these numbers.)

Keep a copy of your raw data; you'll need it for the next homework assignment.

3. Draw a bar graph, with vertical columns representing the means of the data you collected for question 2. If you want a challenge, add thin vertical lines representing the 95 percent confidence intervals, like on my example graph below. Don't struggle if you can't figure out how to add the confidence intervals; I show you on Tuesday. Your graph should have a legend underneath it, explaining what it is, like this:

Bar graph of bird abundance
The number of bird species observed in the Christmas Bird Count at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.

4. Flip through your favorite scientific journal until you find a graph with vertical bars around the means (commonly called "error bars"), like the ones above. Tell me what the bars represent: standard deviation, standard error, 95% confidence limits, or something else. If the paper doesn't say what the bars are, tell me that (if the information isn't in the figure caption, look in the Methods section and in the captions to other graphs). Also list all of the variables shown on the graph, and say whether each one is a nominal or measurement variable. Give the citation information (authors, year, article title, journal, volume, page numbers). Print or photocopy the page with the graph on it (not the whole article) and attach that to your homework.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 20, 2013. Its URL is