# Choosing a statistical test

This table is designed to help you decide which statistical test or descriptive statistic is appropriate for your experiment. In order to use it, you must be able to identify all the variables in the data set and tell what kind of variables they are.

The "hidden" nominal variable in a regression is the nominal variable that groups together two or more observations; for example, in a regression of height and weight, the hidden nominal variable is the name of each person. Most texts don't count this as a variable, and you don't need to write it down (you could just group the the height and weight numbers by putting them on the same line), so that's why I'm calling it "hidden."

 test nominal variables continuous variables rank variables purpose notes example exact test for goodness-of-fit 1 - - test fit of observed frequencies to expected frequencies used for small sample sizes (less than 1000) count the number of males and females in a small sample, test fit to expected 1:1 ratio G-test for goodness-of-fit 1 - - test fit of observed frequencies to expected frequencies used for large sample sizes (greater than 1000) count the number of red, pink and white flowers in a genetic cross, test fit to expected 1:2:1 ratio Chi-square test for goodness-of-fit 1 - - test fit of observed frequencies to expected frequencies used for large sample sizes (greater than 1000) count the number of red, pink and white flowers in a genetic cross, test fit to expected 1:2:1 ratio Randomization test for goodness-of-fit 1 - - test fit of observed frequencies to expected frequencies used for small sample sizes (less than 1000) with a large number of categories count the number of offspring in a trihybrid genetic cross, test fit to expected 27:9:9:9:3:3:3:1 ratio G-test of independence 2+ - - test hypothesis that proportions are the same in different groups large sample sizes (greater than 1000) count the number of apoptotic vs. non-apoptotic cells in liver tissue of organic chemists, molecular biologists, and regular people, test the hypothesis that the proportions are the same Chi-square test of independence 2+ - - test hypothesis that proportions are the same in different groups large sample sizes (greater than 1000) count the number of apoptotic vs. non-apoptotic cells in liver tissue of organic chemists, molecular biologists, and regular people, test the hypothesis that the proportions are the same Fisher's exact test 2 - - test hypothesis that proportions are the same in different groups used for small sample sizes (less than 1000) count the number of left-handed vs. right-handed grad students in Biology and Animal Science, test the hypothesis that the proportions are the same Randomization test of independence 2 - - test hypothesis that proportions are the same in different groups used for small sample sizes (less than 1000) and large numbers of categories count the number of cells in each stage of the cell cycle in two different tissues, test the hypothesis that the proportions are the same Mantel-Haenzel test 3 - - test hypothesis that proportions are the same in repeated pairings of two groups - count the number of left-handed vs. right-handed grad students in Biology and Animal Science at several universities, test the hypothesis that the proportions are the same; alternate hypothesis is a consistent direction of difference arithmetic mean - 1 - description of central tendency of data - - median - 1 - description of central tendency of data more useful than mean for very skewed data median height of trees in forest, if most trees are short seedlings and the mean would be skewed by the few very tall trees range - 1 - description of dispersion of data used more in everyday life than in scientific statistics - variance - 1 - description of dispersion of data forms the basis of many statistical tests; in squared units, so not very understandable - standard deviation - 1 - description of dispersion of data in same units as original data, so more understandable than variance - standard error of the mean - 1 - description of accuracy of an estimate of a mean - - confidence interval - 1 - description of accuracy of an estimate of a mean - - one-way anova, model I 1 1 - test the hypothesis that the mean values of the continuous variable are the same in different groups model I: the nominal variable is meaningful, differences among groups are interesting compare mean heavy metal content in mussels from Nova Scotia, Maine, Massachusetts, Connecticut, New York and New Jersey, to see whether there is variation in the level of pollution one-way anova, model II 1 1 - estimate the proportion of variance in the continuous variable "explained" by the nominal variable model II: the nominal variable is somewhat arbitrary, partitioning variance is more interesting than determining which groups are different compare mean heavy metal content in mussels from five different families raised under common conditions, to see if there is heritable variation in heavy metal uptake sequential Dunn-Sidak method 1 1 - after a significant one-way model I anova, test the homogeneity of means of planned, non-orthogonal comparisons of groups - compare mean heavy metal content in mussels from Nova Scotia+Maine vs. Massachusetts+Connecticut, also Nova Scotia vs. Massachusetts+Connecticut+New York Gabriel's comparison intervals 1 1 - after a significant one-way model I anova, test for significant differences between all pairs of groups - compare mean heavy metal content in mussels from Nova Scotia vs. Maine, Nova Scotia vs. Massachusetts, Maine vs. Massachusetts, etc. Tukey-Kramer method 1 1 - after a significant one-way model I anova, test for significant differences between all pairs of groups - compare mean heavy metal content in mussels from Nova Scotia vs. Maine, Nova Scotia vs. Massachusetts, Maine vs. Massachusetts, etc. Bartlett's test 1 1 - test the hypothesis that the variance of a continous variable is the same in different groups usually used to see whether data fit one of the assumptions of an anova - nested anova 2+ 1 - test hypothesis that the mean values of the continous variable are the same in different groups, when each group is divided into subgroups subgroups must be arbitrary (model II) compare mean heavy metal content in mussels from Nova Scotia, Maine, Massachusetts, Connecticut, New York and New Jersey; several mussels from each location, with several metal measurements from each mussel two-way anova 2 1 - test the hypothesis that different groups, classified two ways, have the same means of the continuous variable - compare cholesterol levels in blood of male vegetarians, female vegetarians, male carnivores, and female carnivores paired t-test 2 1 - test the hypothesis that the means of the continuous variable are the same in paired data - compare the cholesterol level in blood of people before vs. after switching to a vegetarian diet linear regression - 2 - see whether variation in an independent variable causes some of the variation in a dependent variable; estimate the value of one unmeasured variable corresponding to a measured variable - measure chirping speed in crickets at different temperatures, test whether variation in temperature causes variation in chirping speed; or use the estimated relationship to estimate temperature from chirping speed when no thermometer is available correlation - 2 - see whether two variables covary - measure salt intake and fat intake in different people's diets, to see if people who eat a lot of fat also eat a lot of salt multiple regression - 3+ - fit an equation relating several X variables to a single Y variable - measure air temperature, humidity, body mass, leg length, see how they relate to chirping speed in crickets polynomial regression - 2 - test the hypothesis that an equation with X2, X3, etc. fits the Y variable significantly better than a linear regression - - analysis of covariance 1 2 - test the hypothesis that different groups have the same regression lines first step is to test the homogeneity of slopes; if they are not significantly different, the homogeneity of the Y-intercepts is tested measure chirping speed vs. temperature in four species of crickets, see if there is significant variation among the species in the slope or y-intercept of the relationships sign test 2 - 1 test randomness of direction of difference in paired data often used as a non-parametric alternative to a paired t-test compare the cholesterol level in blood of people before vs. after switching to a vegetarian diet, only record whether it is higher or lower after the switch Kruskal–Wallis test 1 - 1 test the hypothesis that rankings are the same in different groups often used as a non-parametric alternative to one-way anova 40 ears of corn (8 from each of 5 varieties) are judged for tastiness, and the mean rank is compared among varieties Spearman rank correlation - - 2 see whether the ranks of two variables covary often used as a non-parametric alternative to regression or correlation 40 ears of corn (8 from each of 5 varieties) are judged for tastiness and prettiness, see whether prettier corn is also tastier