One-way anova: Introduction
When to use it
Analysis of variance (anova) is the most commonly used technique for comparing the means of groups of measurement data. There are lots of different experimental designs that can be analyzed with different kinds of anova; in this handbook, I describe only one-way anova, nested anova and two-way anova.
In a one-way anova (also known as a single-classification anova), there is one measurement variable and one nominal variable. Multiple observations of the measurement variable are made for each value of the nominal variable. For example, you could measure the amount of transcript of a particular gene for multiple samples taken from arm muscle, heart muscle, brain, liver, and lung. The transcript amount would be the measurement variable, and the tissue type (arm muscle, brain, etc.) would be the nominal variable.
Null hypothesis
The statistical null hypothesis is that the means of the measurement variable are the same for the different categories of data; the alternative hypothesis is that they are not all the same.
How the test works
The basic idea is to calculate the mean of the observations within each group, then compare the variance among these means to the average variance within each group. Under the null hypothesis that the observations in the different groups all have the same mean, the weighted among-group variance will be the same as the within-group variance. As the means get further apart, the variance among the means increases. The test statistic is thus the ratio of the variance among means divided by the average variance within groups, or Fs. This statistic has a known distribution under the null hypothesis, so the probability of obtaining the observed Fs under the null hypothesis can be calculated.
The shape of the F-distribution depends on two degrees of freedom, the degrees of freedom of the numerator (among-group variance) and degrees of freedom of the denominator (within-group variance). The among-group degrees of freedom is the number of groups minus one. The within-groups degrees of freedom is the total number of observations, minus the number of groups. Thus if there are n observations in a groups, numerator degrees of freedom is a-1 and denominator degrees of freedom is n-a.
Steps in performing a one-way anova
- Decide whether you are going to do a Model I or Model II anova.
- If you are going to do a Model I anova, decide whether you will do planned comparisons of means or unplanned comparisons of means. A planned comparison is where you compare the means of certain subsets of the groups that you have chosen in advance. In the arm muscle, heart muscle, brain, liver, lung example, an obvious planned comparison might be muscle (arm and heart) vs. non-muscle (brain, liver, lung) tissue. An unplanned comparison is done when you look at the data and then notice that something looks interesting and compare it. If you looked at the data and then noticed that the lung had the highest expression and the brain had the lowest expression, and you then compared just lung vs. brain, that would be an unplanned comparison. The important point is that planned comparisons must be planned before analyzing the data (or even collecting them, to be strict about it).
- If you are going to do planned comparsions, decide which comparisons you will do. If you are going to do unplanned comparisons, decide which technique you will use.
- Collect your data.
- Make sure the data do not violate the assumptions of the anova (normality and homoscedasticity) too severely. If the data do not fit the assumptions, try to find a data transformation that makes them fit. If this doesn't work, do a Kruskal–Wallis test instead of a one-way anova.
- If the data do fit the assumptions of an anova, test the heterogeneity of the means.
- If you are doing a Model I anova, do your planned or unplanned comparisons among means.
- If the means are significantly heterogeneous, and you are doing a Model II anova, estimate the variance components (the proportion of variation that is among groups and the proportion that is within groups).
Similar tests
If you have only two groups, you can do a Student's t-test. This is mathematically equivalent to an anova, so if all you'll ever do is comparisons of two groups, you might as well use t-tests. If you're going to do some comparisons of two groups, and some with more than two groups, it will probably be less confusing if you call all of your tests one-way anovas.
If there are two or more nominal variables, you should use a two-way anova, a nested anova, or something more complicated that I won't cover here. If you're tempted to do a very complicated anova, you may want to break your experiment down into a set of simpler experiments for the sake of comprehensibility.
If the data severely violate the assumptions of the anova, you should use the Kruskal-Wallace test, a non-parametric test, instead.
Power analysis
Doing a power analysis for a one-way anova is kind of tricky. Not only do you need an estimate of the standard deviation within groups, you also need to decide what kind of significant result you're looking for.
If you're mainly interested in the overall significance test, the sample size needed is a function of the standard deviation of the group means. For example, if you're studying transcript amount of some gene in arm muscle, heart muscle, brain, liver, and lung, you might decide that you'd like it to be significant if the means were 10 units in arm muscle, 10 units in heart muscle, 15 units in brain, 15 units in liver, and 15 units in lung. Those five numbers have a standard deviation of 2.74. Your estimate of the standard deviation of means that you're looking for may be based on a pilot experiment or published literature on similar experiments.
If you're mainly interested in the planned or unplanned comparisions of means, you need to decide ahead of time what method you're going to use. The web page described below works for the Dunnett, Tukey/HSD, Bonferroni, and Scheffe methods.
Here is a web page for determining the sample size needed for a one-way anova. In the box on the left of that page, choose "Balanced ANOVA--Any Model" and click on the "Run Selection" button. You'll get a popup window titled "Select an ANOVA model"; leave everything with the defaults. If you're mainly interested in comparisons of means, click on the "Differences/Contrasts" button and see if you can figure it out; I couldn't.
To do a power analysis for the overall significance test, click on the F-tests button. Each variable has a slider bar and an obscure little square that you can click on to type in the number. For "levels[treatment]", enter the number of groups. Enter your estimate of the standard deviation among groups that you'd like to detect under "SD[treatment]", and enter your estimate of the standard deviation within groups under "SD[within]". For "n[Within]", enter a guess of the sample size (it must be the same in each group). Increase or decrease "n[Within]" until the "Power[treatment]" is your desired power (1-beta). For example, if you want to detect a significant (P<0.05) difference among 5 means with a standard deviation among means of 2.74, a standard deviation within groups of 5, and a beta (probability of a false negative) of 0.10, enter the appropriate numbers and then try different sample sizes until you hit "n[Within]" of 14, which gives you a "Power[treatment]" of 0.90.
Further reading
Sokal and Rohlf, pp. 206-217.
Zar, pp. 177-195.
⇐ Previous topic | Next topic ⇒
This page was last revised August 7, 2008. Its address is http://udel.edu/~mcdonald/statanovaintro.html. It may be cited as pp. 115-118 in: McDonald, J.H. 2008. Handbook of Biological Statistics. Sparky House Publishing, Baltimore, Maryland.
©2008 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.