Biological Data Analysis:
Final exam study guide

This is the first part of the study guide for the final exam in Biological Data Analysis, spring 2017. There are also four sets of practice questions, each as long as the real exam (30 questions). I recommend that you spend some time studying, then try to answer the practice questions under test conditions (no book or notes, timed, in a room full of people who are eerily quiet).

The exam will be Monday, May 22, 3:30-5:30 p.m. in 205 Gore (the lecture room). If you are unable to take the exam at that time, let me know as soon as possible so we can schedule a makeup exam.

You may not use your notes or textbook during the exam; if English is your second language, you may use a dictionary. You will not need a calculator.

The exam will consist of 30 questions, each worth 1.5 points. About 25 will be of the format you've seen on the previous exams: I will describe a data set, and you will say what the best statistical test to use would be. Your answers must be specific. For chi-squared or G-tests, you must specify whether it is a goodness-of-fit test or test of independence. For t-tests, you must specify one-sample or two-sample or paired. For anovas, you must specify one-way, two-way, or nested. If the answer is a two-way anova, you must specify with or without replication. For regression, you must specify linear regression, polynomial regression, multiple linear regression, simple logistic regression, or multiple logistic regression. If there are two equally appropriate tests (such as G-test or chi-squared test, two-sample t-test or one-way anova), you must only put one down. For the purposes of this exam, "correlation" and "linear regression" are considered equivalent; you may write down one or the other, or write "correlation/linear regression."

Unless something in the question makes it clear that an assumption is violated, you should assume that all data meet the parametric assumptions (normality and homoscedasticity) and that all correlation/regressions are linear and independent.

You will need to know when to use the following tests:

You may use "two-sample t-test" as an answer, but you do not have to, since it is mathematically equivalent to one-way anova; you can just use "one-way anova" whether there are two or more than two categories. Likewise, you can use "Welch's t-test" instead of "Welch's anova" and "paired t-test" instead of "two-way anova without replication," but you do not have to. Of course, if there are more than two categories, two-sample t-test, Welch's t-test, and paired t-test will be incorrect.

The textbook includes repeated G-tests of goodness-of-fit, Wilcoxon signed-rank test, sign test, and maybe one or two others that I'm forgetting. These are not on the syllabus and we didn't talk about them in class, so you do not need them for the exam.

About 5 questions will be on other material. You should know the assumptions of the different tests, how you tell whether those assumptions are met, and what to do if they're not met. You should be familiar with the different descriptive statistics, what they mean and what they're useful for. You should understand data transformation, multiple comparisons, and meta-analysis. You should be able to interpret the results of the different tests; for example, you should be able to explain what a significant interaction term means in a two-way anova, and how to interpret the partitioning of variance in a one-way or nested anova.


Go to the practice exam 1

Go to the practice exam 2

Go to the practice exam 3

Go to the practice exam 4


Return to the Biological Statistics syllabus