You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.
1. In 1983, I collected data on polymorphism in the gene coding for glucose-6-phosphate isomerase (GPI) in the amphipod crustacean Megalorchestia californiana (the critter shown in the banner at the top of the handbook). In 2009, I collected from six of the same locations, to see whether natural selection by changing climate had caused a change in the allele frequencies. There are two GPI alleles, "fast" and "slow"; the table below shows the number of alleles at each location in each year (it's an autosomal gene, so each individual has two alleles). Analyze the data using the Cochran-Mantel-Haenszel test, draw a graph that summarizes the data, and write a sentence interpreting the results.
Number of alleles
Location Year fast slow
---------------- ---- ---- ----
Pt. Townsend, WA 1983 4 182
2009 8 218
Neskowin, OR 1983 237 181
2009 376 324
Siuslaw Jetty, OR 1983 1006 780
2009 301 233
Winchester Bay, OR 1983 190 172
2009 539 457
Empire, OR 1983 62 72
2009 295 281
Bastendorf Beach, OR 1983 298 270
2009 467 387
The P-value from the Cochran-Mantel-Haenszel test is 0.571. This means there is no significant difference in allele frequency between the 1983 and 2009 samples. Note--don't say something like "This shows that allele frequencies did not change between 1983 and 2009." The allele frequencies may have changed, it just may not have been a big enough change to be statistically significant.
On the graph, you should use proportions as your Y variable, not the raw numbers. Also, because there are just two alleles, "fast" and "slow," you don't need to plot both; showing that the allele frequency of fast is 0.02 implies that the frequency of slow is 0.98.
![]() |
| Proportions (with 95% confidence intervals) of the fast allele at the Gpi locus in samples of Megalorchestia californiana. Black bars are 1983 samples, gray bars are 2009. |
2. Collect some measurement data. You must have one nominal variable with at least six values (such as six different biology classes) and one measurement variable (such as blood pressure). You must have at least ten observations in each of your six or more categories (such as ten students in each of the six biology classes). Your data set could be a published data set, some data you've collected for your research, or some data you collect for this assignment.
Try to pick groups where the difference in the measurement variable is not so big that it's obvious the statistical test will be highly significant. For example, if you measure the weights of leaves, measure leaves from six or more trees of the same species, not different species with obviously different leaves.
Put your raw data (the individual observations) in a table. For each category, calculate the mean, median, range, standard deviation, standard error of the mean, and 95 percent confidence interval (you may use a spreadsheet, web page or computer program for this). Add these summary numbers to the table.
Keep a copy of your raw data; you'll need it for the next homework assignment.
3. Draw a bar graph, with vertical columns representing the means and thin vertical lines representing the 95 percent confidence intervals of the data you collected for question 2. Your graph should have a legend underneath it, explaining what it is, like this:
![]() |
| The number of bird species observed in the Christmas Bird Count at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals. |
4. Flip through your favorite scientific journal until you find a graph with vertical bars around the means (commonly called "error bars"), like the ones above. Tell me what the bars represent: standard deviation, standard error, 95% confidence limits, or something else. If the paper doesn't say what the bars are, tell me that (if the information isn't in the figure caption, look in the Methods section and in the captions to other graphs). Also list all of the variables shown on the graph, and say whether each one is a nominal or measurement variable. Give the citation information (authors, year, article title, journal, volume, page numbers). Print or photocopy the page with the graph on it (not the whole article) and attach that to your homework.
Return to the Biological Data Analysis syllabus
Return to John McDonald's home page
This page was last revised September 27, 2012. Its URL is http://udel.edu/~mcdonald/stathw4.html