Once you have chosen between a model I and model II anova, the next step is to test the homogeneity of means. The null hypothesis is that the all the groups have the same mean, and the alternate hypothesis is that at least one of the means is different from the others.
To test the null hypothesis, the variance of the population is estimated in two different ways. I'll explain this in a way that is strictly correct only for a "balanced" one-way anova, one in which the sample size for each group is the same, but the basic concept is the same for unbalanced anovas.
If the null hypothesis is true, all the groups are samples from populations with the same mean. One of the assumptions of the anova is that the populations have the same variance, too. One way to estimate this variance starts by calculating the variance within each sample—take the difference between each observation and its group's mean, square it, then sum these squared deviates and divide by the number of observations in the group minus one. Once you've estimated the variance within each group, you can take the average of these variances. This is called the "within-group mean square," or MSwithin.
For another way to estimate the variance within groups, remember that if you take repeated samples of a population, you expect the means you get from the multiple samples to have a standard deviation that equals the standard deviation within groups divided by the square root of n; this is the definition of standard error of the mean, or
Remember that the standard deviation is just the square root of the variance, so squaring both sides of this gives:
so the second way of estimating the variance within groups is n×Varmeans, the sample size within a group times the variance of the group means. This quantity is known as the among-group mean square, abbreviated MSamong or MSgroup.
If the null hypothesis is true and the groups are all samples from populations with the same mean, the two estimates of within-group variance, MSwithin and MSamong, should be about the same; they're just different ways of estimating the same quantity. Dividing MSamong by MSwithin should therefore be around 1. This quantity, MSamong/MSwithin, is known as Fs, and it is the test statistic for the anova.
If the null hypothesis is not true, and the groups are samples of populations with different means, then MSamong will be bigger than MSwithin, and Fs will be greater than 1. To illustrate this, here are two sets of five samples (n=20) taken from normally distributed populations. The first set of five samples are from populations with a mean of 5; the null hypothesis, that the populations all have the same mean, is true.
|Five samples (n=20) from populations with parametric means of 5. Red bars indicate sample means.|
|Five samples (n=20) from populations with parametric means of 5. Thick horizontal lines indicate sample means.|
The variance among the five group means is quite small; multiplying it by the sample size (20) yields 0.72, about the same as the average variance within groups (1.08). These are both about the same as the parametric variance for these populations, which I set to 1.0.
|Four samples (n=20) from populations with parametric means of 5; the last sample is from a population with a parametric mean of 3.5. Red bars indicate sample means.|
|Four samples (n=20) from populations with parametric means of 5; the last sample is from a population with a parametric mean of 3.5. Thick horizontal lines indicate sample means.|
The second graph is the same as the first, except that I have subtracted 1.5 from each value in the last sample. The average variance within groups (MSwithin) is exactly the same, because each value was reduced by the same amount; the size of the variation among values within a group doesn't change. The variance among groups does get bigger, because the mean for the last group is now quite a bit different from the other means. MSamong is therefore quite a bit bigger than MSwithin, so the ratio of the two (Fs) is much larger than 1.
The theoretical distribution of Fs under the null hypothesis is given by the F-distribution. It depends on the degrees of freedom for both the numerator (among-groups) and denominator (within-groups). The probability associated with an F-statistic is given by the spreadsheet function FDIST(x, df1, df2), where x is the observed value of the F-statistic, df1 is the degrees of freedom in the numerator (the number of groups minus one, for a one-way anova) and df2 is the degrees of freedom in the denominator (total n minus the number of groups, for a one-way anova).
Here are some data on a shell measurement (the length of the anterior adductor muscle scar, standardized by dividing by length) in the mussel Mytilus trossulus from five locations: Tillamook, Oregon; Newport, Oregon; Petersburg, Alaska; Magadan, Russia; and Tvarminne, Finland, taken from a much larger data set used in McDonald et al. (1991).
Tillamook Newport Petersburg Magadan Tvarminne 0.0571 0.0873 0.0974 0.1033 0.0703 0.0813 0.0662 0.1352 0.0915 0.1026 0.0831 0.0672 0.0817 0.0781 0.0956 0.0976 0.0819 0.1016 0.0685 0.0973 0.0817 0.0749 0.0968 0.0677 0.1039 0.0859 0.0649 0.1064 0.0697 0.1045 0.0735 0.0835 0.1050 0.0764 0.0659 0.0725 0.0689 0.0923 0.0836
The conventional way of reporting the complete results of an anova is with a table (the "sum of squares" column is often omitted). Here are the results of a one-way anova on the mussel data:
|sum of squares||d.f.||mean square||Fs||P|
If you're not going to use the mean squares for anything, you could just report this as "The means were significantly heterogeneous (one-way anova, F4, 34=7.12, P=2.8×10-4)." The degrees of freedom are given as a subscript to F.
Note that statisticians often call the within-group mean square the "error" mean square. I think this can be confusing to non-statisticians, as it implies that the variation is due to experimental error or measurement error. In biology, the within-group variation is often largely the result of real, biological variation among individuals, not the kind of mistakes implied by the word "error."
Graphing the results
|Length of the anterior adductor muscle scar divided by total length in Mytilus trossulus. Means ±one standard error are shown for five locations.|
The usual way to graph the results of a one-way anova is with a bar graph. The heights of the bars indicate the means, and there's usually some kind of error bar: 95% confidence intervals, standard errors, or comparison intervals. Be sure to say in the figure caption what the error bars represent.
How to do the test
I have put together a spreadsheet to do one-way anova on up to 50 groups and 1000 observations per group. It calculates the P-value, does unplanned comparisons of means (appropriate for a model I anova) using Gabriel comparison intervals and the Tukey–Kramer test, and partitions the variance (appropriate for a model II anova) into among- and within-groups components.
Some versions of Excel include an "Analysis Toolpak," which includes an "Anova: Single Factor" function that will do a one-way anova. You can use it if you want, but I can't help you with it. It does not include any techniques for unplanned comparisons of means, and it does not partition the variance.
Several people have put together web pages that will perform a one-way anova; one good one is here. It is easy to use, and will handle three to 26 groups and 3 to 1024 observations per group. It does not calculate statistics used for unplanned comparisons, and it does not partition the variance. Another good web page for anova is Rweb.
There are several SAS procedures that will perform a one-way anova. The two most commonly used are PROC ANOVA and PROC GLM. Either would be fine for a one-way anova, but PROC GLM (which stands for "General Linear Models") can be used for a much greater variety of more complicated analyses, so you might as well use it for everything.
Here is a SAS program to do a one-way anova on the mussel data from above.
data musselshells; input location $ aam; cards; Tillamook 0.0571
====See the web page for the full data set====
Tvarminne 0.1045 proc glm data=musselshells; class location; model aam = location; run;
The output includes the traditional anova table; the P-value is given under "Pr > F".
Sum of Source DF Squares Mean Square F Value Pr > F Model 4 0.00451967 0.00112992 7.12 0.0003 Error 34 0.00539491 0.00015867 Corrected Total 38 0.00991458
If the data show a lot of heteroscedasticity (different groups have different variances), the one-way anova can yield an inaccurate P-value; the probability of a false positive may be much higher than 5 percent. In that case, the most common alternative is Welch's anova. This can be done in SAS by adding a MEANS statement, the name of the nominal variable, and the word WELCH following a slash. Here is the example SAS program from above, modified to do Welch's anova:
proc glm data=musselshells; class location; model aam = location; means location / welch; run;
Here is the output:
Welch's ANOVA for aam Source DF F Value Pr > F location 4.0000 5.66 0.0051 Error 15.6955
Sokal and Rohlf, pp. 207-217.
Zar, pp. 183.
McDonald, J.H., R. Seed and R.K. Koehn. 1991. Allozymes and morphometric characters of three species of Mytilus in the Northern and Southern Hemispheres. Mar. Biol. 111:323-333.
This page was last revised August 31, 2009. Its address is http://udel.edu/~mcdonald/statanovasig.html. It may be cited as pp. 130-136 in:
McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.