Choosing a statistical test

This table is designed to help you decide which statistical test or descriptive statistic is appropriate for your experiment. In order to use it, you must be able to identify all the variables in the data set and tell what kind of variables they are.

The "hidden" nominal variable in a regression is the nominal variable that groups together two or more observations; for example, in a regression of height and weight, the hidden nominal variable is the name of each person. Most texts don't count this as a variable, and you don't need to write it down (you could just group the the height and weight numbers by putting them on the same line), so that's why I'm calling it "hidden."

testnominal variablescontinuous variablesrank variablespurposenotesexample
exact test for goodness-of-fit1--test fit of observed frequencies to expected frequenciesused for small sample sizes (less than 1000)count the number of males and females in a small sample, test fit to expected 1:1 ratio
G-test for goodness-of-fit1--test fit of observed frequencies to expected frequenciesused for large sample sizes (greater than 1000)count the number of red, pink and white flowers in a genetic cross, test fit to expected 1:2:1 ratio
Chi-square test for goodness-of-fit1--test fit of observed frequencies to expected frequenciesused for large sample sizes (greater than 1000)count the number of red, pink and white flowers in a genetic cross, test fit to expected 1:2:1 ratio
Randomization test for goodness-of-fit1--test fit of observed frequencies to expected frequenciesused for small sample sizes (less than 1000) with a large number of categoriescount the number of offspring in a trihybrid genetic cross, test fit to expected 27:9:9:9:3:3:3:1 ratio
G-test of independence2+--test hypothesis that proportions are the same in different groupslarge sample sizes (greater than 1000)count the number of apoptotic vs. non-apoptotic cells in liver tissue of organic chemists, molecular biologists, and regular people, test the hypothesis that the proportions are the same
Chi-square test of independence2+--test hypothesis that proportions are the same in different groupslarge sample sizes (greater than 1000)count the number of apoptotic vs. non-apoptotic cells in liver tissue of organic chemists, molecular biologists, and regular people, test the hypothesis that the proportions are the same
Fisher's exact test2--test hypothesis that proportions are the same in different groupsused for small sample sizes (less than 1000)count the number of left-handed vs. right-handed grad students in Biology and Animal Science, test the hypothesis that the proportions are the same
Randomization test of independence2--test hypothesis that proportions are the same in different groupsused for small sample sizes (less than 1000) and large numbers of categoriescount the number of cells in each stage of the cell cycle in two different tissues, test the hypothesis that the proportions are the same
Mantel-Haenzel test3--test hypothesis that proportions are the same in repeated pairings of two groups-count the number of left-handed vs. right-handed grad students in Biology and Animal Science at several universities, test the hypothesis that the proportions are the same; alternate hypothesis is a consistent direction of difference
arithmetic mean-1-description of central tendency of data--
median-1-description of central tendency of datamore useful than mean for very skewed datamedian height of trees in forest, if most trees are short seedlings and the mean would be skewed by the few very tall trees
range-1-description of dispersion of dataused more in everyday life than in scientific statistics-
variance-1-description of dispersion of dataforms the basis of many statistical tests; in squared units, so not very understandable-
standard deviation-1-description of dispersion of datain same units as original data, so more understandable than variance-
standard error of the mean-1-description of accuracy of an estimate of a mean--
confidence interval-1-description of accuracy of an estimate of a mean--
one-way anova, model I11-test the hypothesis that the mean values of the continuous variable are the same in different groupsmodel I: the nominal variable is meaningful, differences among groups are interestingcompare mean heavy metal content in mussels from Nova Scotia, Maine, Massachusetts, Connecticut, New York and New Jersey, to see whether there is variation in the level of pollution
one-way anova, model II11-estimate the proportion of variance in the continuous variable "explained" by the nominal variablemodel II: the nominal variable is somewhat arbitrary, partitioning variance is more interesting than determining which groups are differentcompare mean heavy metal content in mussels from five different families raised under common conditions, to see if there is heritable variation in heavy metal uptake
sequential Dunn-Sidak method11-after a significant one-way model I anova, test the homogeneity of means of planned, non-orthogonal comparisons of groups -compare mean heavy metal content in mussels from Nova Scotia+Maine vs. Massachusetts+Connecticut, also Nova Scotia vs. Massachusetts+Connecticut+New York
Gabriel's comparison intervals11-after a significant one-way model I anova, test for significant differences between all pairs of groups -compare mean heavy metal content in mussels from Nova Scotia vs. Maine, Nova Scotia vs. Massachusetts, Maine vs. Massachusetts, etc.
Tukey-Kramer method11-after a significant one-way model I anova, test for significant differences between all pairs of groups -compare mean heavy metal content in mussels from Nova Scotia vs. Maine, Nova Scotia vs. Massachusetts, Maine vs. Massachusetts, etc.
Bartlett's test 11-test the hypothesis that the variance of a continous variable is the same in different groupsusually used to see whether data fit one of the assumptions of an anova-
nested anova2+1-test hypothesis that the mean values of the continous variable are the same in different groups, when each group is divided into subgroupssubgroups must be arbitrary (model II)compare mean heavy metal content in mussels from Nova Scotia, Maine, Massachusetts, Connecticut, New York and New Jersey; several mussels from each location, with several metal measurements from each mussel
two-way anova21-test the hypothesis that different groups, classified two ways, have the same means of the continuous variable-compare cholesterol levels in blood of male vegetarians, female vegetarians, male carnivores, and female carnivores
paired t-test21-test the hypothesis that the means of the continuous variable are the same in paired data-compare the cholesterol level in blood of people before vs. after switching to a vegetarian diet
linear regression-2-see whether variation in an independent variable causes some of the variation in a dependent variable; estimate the value of one unmeasured variable corresponding to a measured variable-measure chirping speed in crickets at different temperatures, test whether variation in temperature causes variation in chirping speed; or use the estimated relationship to estimate temperature from chirping speed when no thermometer is available
correlation-2-see whether two variables covary-measure salt intake and fat intake in different people's diets, to see if people who eat a lot of fat also eat a lot of salt
multiple regression-3+-fit an equation relating several X variables to a single Y variable-measure air temperature, humidity, body mass, leg length, see how they relate to chirping speed in crickets
polynomial regression-2-test the hypothesis that an equation with X2, X3, etc. fits the Y variable significantly better than a linear regression--
analysis of covariance12-test the hypothesis that different groups have the same regression linesfirst step is to test the homogeneity of slopes; if they are not significantly different, the homogeneity of the Y-intercepts is testedmeasure chirping speed vs. temperature in four species of crickets, see if there is significant variation among the species in the slope or y-intercept of the relationships
sign test2-1test randomness of direction of difference in paired dataoften used as a non-parametric alternative to a paired t-testcompare the cholesterol level in blood of people before vs. after switching to a vegetarian diet, only record whether it is higher or lower after the switch
Kruskal–Wallis test1-1test the hypothesis that rankings are the same in different groupsoften used as a non-parametric alternative to one-way anova40 ears of corn (8 from each of 5 varieties) are judged for tastiness, and the mean rank is compared among varieties
Spearman rank correlation--2see whether the ranks of two variables covaryoften used as a non-parametric alternative to regression or correlation40 ears of corn (8 from each of 5 varieties) are judged for tastiness and prettiness, see whether prettier corn is also tastier


Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised October 10, 2008. Its address is http://udel.edu/~mcdonald/statbigchart.html. It may be cited as pp. 308-313 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.