You must type this and all other homework assignments. Do not e-mail the assignment to me; turn it in early (at 322 Wolf) for a foreseeable absence, or turn it in late after an unexpected absence from class.
1. In 1983, I collected data on polymorphism in the gene coding for glucose-6-phosphate isomerase (GPI) in the amphipod crustacean Megalorchestia californiana (the critter shown in the banner at the top of the handbook). In 2009, I collected from six of the same locations, to see whether natural selection by changing climate had caused a change in the allele frequencies. There are two GPI alleles, "fast" and "slow"; the table below shows the number of alleles at each location in each year (it's an autosomal gene, so each individual has two alleles). Analyze the data using the Cochran-Mantel-Haenszel test, draw a graph that summarizes the data, and write a sentence interpreting the results.
Number of alleles Location Year fast slow ---------------- ---- ---- ---- Pt. Townsend, WA 1983 4 182 2009 8 218 Neskowin, OR 1983 237 181 2009 376 324 Siuslaw Jetty, OR 1983 1006 780 2009 301 233 Winchester Bay, OR 1983 190 172 2009 539 457 Empire, OR 1983 62 72 2009 295 281 Bastendorf Beach, OR 1983 298 270 2009 467 387
The P-value from the Cochran-Mantel-Haenszel test is 0.571. This means there is no significant difference in allele frequency between the 1983 and 2009 samples. Note--don't say something like "This shows that allele frequencies did not change between 1983 and 2009." The allele frequencies may have changed, it just may not have been a big enough change to be statistically significant.
On the graph, you should use proportions as your Y variable, not the raw numbers. Also, because there are just two alleles, "fast" and "slow," you don't need to plot both; showing that the allele frequency of fast is 0.02 implies that the frequency of slow is 0.98.
|Proportions (with 95% confidence intervals) of the fast allele at the Gpi locus in samples of Megalorchestia californiana. Black bars are 1983 samples, gray bars are 2009.|
2. Collect some measurement data. You must have one nominal variable with at least six values (such as six different biology classes) and one measurement variable (such as blood pressure). You must have at least ten observations in each of your six or more categories (such as ten students in each of the six biology classes). Your data set could be a published data set, some data you've collected for your research, or some data you collect for this assignment.
Try to pick groups where the difference in the measurement variable is not so big that it's obvious the statistical test will be highly significant. For example, if you measure the weights of leaves, measure leaves from six or more trees of the same species, not different species with obviously different leaves.
Put your raw data (the individual observations) in a table. For each category, calculate the mean, median, range, standard deviation, standard error of the mean, and 95 percent confidence interval (you may use a spreadsheet, web page or computer program for this). Add these summary numbers to the table.
(Note that we haven't talked about standard deviation, standard error, or confidence intervals yet; skim through the appropriate textbook chapters if you want to understand these numbers.)
Keep a copy of your raw data; you'll need it for the next homework assignment.
3. Draw a bar graph, with vertical columns representing the means of the data you collected for question 2. If you want a challenge, add thin vertical lines representing the 95 percent confidence intervals, like on my example graph below. Don't struggle if you can't figure out how to add the confidence intervals; I show you on Tuesday. Your graph should have a legend underneath it, explaining what it is, like this:
|The number of bird species observed in the Christmas Bird Count at seven locations in Delaware. Data points are the mean number of species for the counts in 2001 through 2006, with 95 percent confidence intervals.|
4. Flip through your favorite scientific journal until you find a graph with vertical bars around the means (commonly called "error bars"), like the ones above. Tell me what the bars represent: standard deviation, standard error, 95% confidence limits, or something else. If the paper doesn't say what the bars are, tell me that (if the information isn't in the figure caption, look in the Methods section and in the captions to other graphs). Also list all of the variables shown on the graph, and say whether each one is a nominal or measurement variable. Give the citation information (authors, year, article title, journal, volume, page numbers). Print or photocopy the page with the graph on it (not the whole article) and attach that to your homework.
Return to the Biological Data Analysis syllabus
Return to John McDonald's home page
This page was last revised September 20, 2013. Its URL is http://udel.edu/~mcdonald/stathw4.html