Final exam study guide

This is the first part of the study guide for the final exam in Biological Data Analysis, spring 2018. There are also four sets of practice questions, each as long as the real exam (30 questions). I recommend that you spend some time studying, then try to answer the practice questions under test conditions (no book or notes, timed, in a room full of people who are eerily quiet).

You will have two times to take the final exam: 3:30 to 5:30 p.m. on Thursday, May 17, in 243 Wolf; or 1 to 3 p.m. on Thursday, May 24, in 205 Gore (the regular lecture room). You do not have to decide which day you'll take the exam until May 17.

You may not use your notes or textbook during the exam; if English is your second language, you may use a dictionary. You will not need a calculator.

The exam will consist of 30 questions, each worth 1.5 points. About 25 will be of the format you've seen on the previous exams: I will describe a data set, and you will say what the best statistical test to use would be. Your answers must be specific. For chi-squared or G-tests, you must specify whether it is a goodness-of-fit test or test of independence. For t-tests, you must specify one-sample, Student's two-sample, Welch's two-sample, or paired. For anovas, you must specify Fisher's one-way, Welch's one-way, two-way, or nested. **If the answer is a two-way anova, you must specify with or without replication**. For regression, you must specify linear regression, curvilinear regression, multiple linear regression, simple logistic regression, or multiple logistic regression. If there are two equally appropriate tests (such as G-test or chi-squared test, two-sample t-test or one-way anova), you must only put one down. For the purposes of this exam, "correlation" and "linear regression" are considered equivalent; you may write down one or the other, or write "correlation/linear regression."

**Unless something in the question makes it clear that an assumption is violated,** you should assume that all data meet the parametric assumptions (normality and homoscedasticity) and that all correlation/regressions are linear and independent.

You will need to know when to use the following tests:

- exact test of goodness-of-fit
- chi-square or G-test of goodness-of-fit
- Fisher's exact test
- chi-square or G-test of independence
- Cochran-Mantel-Haenszel test
- one-sample t-test
- Fisher's one-way anova
- Welch's one-way anova
- Tukey-Kramer test
- Kruskal-Wallis test
- nested anova
- two-way anova with replication
- two-way anova without replication
- linear regression/correlation
- Spearman's rank correlation
- curvilinear regression
- ancova
- multiple linear regression
- simple logistic regression
- multiple logistic regression
- Bonferroni correction, Benjamini-Hochberg procedure (either is correct for any question)
- meta-analysis

You may use "Student's two-sample t-test" as an answer, but you do not have to, since it is mathematically equivalent to Fisher's one-way anova; you can just use "Fisher's one-way anova" whether there are two or more than two categories. Likewise, you can use "Welch's two-sample t-test" instead of "Welch's one-way anova" and "paired t-test" instead of "two-way anova without replication," but you do not have to. Of course, if there are more than two categories, two-sample t-test, Welch's two-sample t-test, and paired t-test will be incorrect.

"Polynomial regression" is one particular kind of the broader category of "curvilinear regression." We mostly talked about polynomial regression in class, because it's the most common form of curvilinear regression. For any exam question that tells you the relationship between two measurement variables is curved, you may say either "curvilinear regression" or "polynomial regression."

The textbook includes repeated G-tests of goodness-of-fit, Wilcoxon signed-rank test, sign test, and maybe one or two others that I'm forgetting. These are not on the syllabus and we didn't talk about them in class, so you do *not* need them for the exam.

About 5 questions will be on other material. You should know the assumptions of the different tests, how you tell whether those assumptions are met, and what to do if they're not met. You should be familiar with the different descriptive statistics, what they mean and what they're useful for. You should understand data transformation, multiple comparisons, and meta-analysis. You should be able to interpret the results of the different tests; for example, you should be able to explain what a significant interaction term means in a two-way anova, and how to interpret the partitioning of variance in a one-way or nested anova.

The first two practice exams include material through multiple linear regression; they do not include logistic regression, multiple comparisons, or meta-analysis:

Go to the practice exam 1

Go to the practice exam 2

The last two practice exams include all of the material:

Go to the practice exam 3

Go to the practice exam 4