Biological Data Analysis: Exam 2 Answers

  1. There are two nominal variables, with/withoutDHFR inhibitor and with/without new mutation, and the total sample size is greater than 1000, so chi-square or G-test of independence.
  2. One correct answer is a data set that is severely skewed, such as by one or more extreme outliers. Another correct answer is a variable such as length of life that involves waiting, since you know the median as soon as half the individuals are dead and don't have to wait for the last one to die.
  3. Chihuahua vs. doberman and small dogs vs. big dogs is one possible answer; it is not orthogonal because chihuahua vs. doberman is a part of small dogs vs. big dogs.
  4. Two nominal variables, type of lure and type of fish, and a total sample size of about 400, so Fisher's exact test.
  5. One nominal variable (which knee is injured), an external null hypothesis (one-fourth of injuries on each knee) and a total sample size over 1000, so chi-square or G-test of goodness-of-fit.
  6. A power analysis for a measurement variable requires four numbers. You already have effect size (a difference of 0.1 m/s) and alpha (0.05), so you need power and standard deviation.
  7. One nominal variable (which quadrant), an external null hypothesis (one-fourth of butterflies in each quadrant) and a total sample size over 1000, so chi-square or G-test of goodness-of-fit. Note that this is pretty much the same experiment and same answer as question 5. I use the Excel random number function to put the questions in random order, so don't be surprised if the final includes two or three questions in a row with the same answer; it could happen by chance, it's not a trick.
  8. One meaningful nominal variable (brownie, joint or control) and one measurement variable (number of pellets eaten), so one-way anova, model I.
  9. Two nominal variables, MUNK17 vs. control and alive vs. dead, and a total sample size over 1000, so chi-square or G-test of independence.
  10. It is wrong because there are many different pairs of chicken breeds you could have compared, so the chance that the biggest vs. smallest mean gives a P-value less than 0.05, even though the null hypothesis is true, is very high. You should have done a one-way anova, model I, followed by a planned comparison (if you planned to compare these two breeds) or a Tukey-Kramer test. Note that you had to explain why the t-test was bad, not just say that it was bad and that the one-way anova would be better.
  11. If there is heteroscedasticity, an unbalanced design could increase the chance of a false positive to much greater than 0.05. Note that the unbalanced design doesn't cause heteroscedasticity; also note that you had to say why an unbalanced design was bad.
  12. One meaningful nominal variable (light source) and one measurement variable (melatonin level), so one-way anova, model I.
  13. One nominal variable (food type) and one true ranked variable (ranked from pale yellow to dark orange), so Kruskal-Wallis test.
  14. Two nominal variables (nose plugged vs. unplugged, returned vs. didn't return) and a total sample size less than 1000, so Fisher's exact test.
  15. One nominal variable (individual rabbits) with meaningless names (you don't care which rabbit has the most glycogen), one measurement variable (glycogen), so one-way anova, model II.
  16. Three nominal variables: professor, asleep vs. awake, and date, so Cochran-Mantel-Haenszel test. "CMH test" is acceptable, and I even accepted "Cauchy Hanztle Mantle," the most creative misspelling I've ever seen for this test.
  17. The larger standard error means that she is less confident of the mean; the true value could lie in a larger range. Note that this is not necessarily due to a larger standard deviation; it could represent a smaller sample size.
  18. The larger standard deviation means the normal curve of running speeds is wider for A. nasatum.
  19. You should try different transformations to see if one makes the data more normal. Note that "do a log transformation" is incorrect; while it is the most common in biology, it doesn't always work (and as it happens, would make the data in the graph even more skewed). Note that "check the normality" is not the next step, as you've already checked the normality by drawing the histogram.
  20. It is the probability that 20 out of 32 people, or more, would raise their left eyebrow by chance, if the null hypothesis is true.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised October 25, 2012. Its URL is http://udel.edu/~mcdonald/statexam2answers.html