In a Model I anova, it is often desirable to perform additional comparisons of subsets of the means. If you didn't decide on some planned comparisons before doing the anova, you will be doing unplanned comparisons. Because these are unplanned, you can't just do the comparison as an anova and use the resulting P-value. Instead you have to use a test that takes into account the large number of possible comparisons you could have done. For example, if you did an anova with five groups (A, B, C, D, and E), then noticed that A had the highest mean and D had the lowest, you couldn't do an anova on just A and D. There are 10 possible pairs you could have compared (A with B, A with C, etc.) and the probability under the null hypothesis that one of those 10 pairs is "significant" at the p<0.05 level is much greater than 0.05. It gets much worse if you consider all of the possible ways of dividing the groups into two sets (A vs. B, A vs. B+C, A vs. B+C+D, A+B vs. C+D, etc.) or more than two sets (A vs. B. vs C, A vs. B vs. C+D, etc.).
There is a bewildering array of tests that have been proposed for unplanned comparisons; some of the more popular include the Student–Neuman–Keuls (SNK) test, Duncan's multiple range test, the Tukey–Kramer method, the REGWQ method, and Fisher's Least Significant Difference (LSD). For this handbook, I am only covering two techniques, Gabriel comparison intervals and the Tukey–Kramer method, that apply only to unplanned comparisons of pairs of group means.
I will not consider tests that apply to unplanned comparisons of more than two means, or unplanned comparisons of subsets of groups. There are techniques available for this (the Scheffé test is probably the most common), but with a moderate number of groups, the number of possible comparisons becomes so large that the P-values required for significance become ridiculously small.
To compute the Gabriel comparison interval (Gabriel 1978), the standard error of the mean for a group is multiplied by the studentized maximum modulus times the square root of one-half. The standard error of the mean is estimated by dividing the MSwithin from the entire anova by the number of observations in the group, then taking the square root of that quantity. The studentized maximum modulus is a statistic that depends on the number of groups, the total sample size in the anova, and the desired probability level (alpha).
Once the Gabriel comparison interval is calculated, the lower comparison limit is found by subtracting the interval from the mean, and the upper comparison limit is found by adding the interval to the mean. This is done for each group in an anova. Any pair of groups whose comparison intervals do not overlap is significantly different at the P<alpha level. For example, on the graph of the mussel data shown below, there is a significant difference in AAM between mussels from Newport and mussels from Petersburg. Tillamook and Newport do not have significantly different AAM, because their Gabriel comparison intervals overlap.
|Mean AAM (anterior adductor muscle scar standardized by total shell length) for Mytilus trossulus from five locations. Means are shown with Gabriel comparison intervals (Gabriel 1978); pairs of means whose comparison intervals do not overlap are significantly different (P<0.05). Data from the one-way anova page.|
I like Gabriel comparison intervals; the results are about the same as with other techniques for unplanned comparisons of pairs of means, but you can present them in a more easily understood form. However, Gabriel comparison intervals are not that commonly used. If you are using them, it is very important to emphasize that the vertical bars represent comparison intervals and not the more common (but less useful) standard errors of the mean or 95% confidence intervals. You must also explain that means whose comparison intervals do not overlap are significantly different from each other.
In the Tukey–Kramer method, the minimum significant difference (MSD) is calculated for each pair of means. If the observed difference between a pair of means is greater than the MSD, the pair of means is significantly different.
The Tukey–Kramer method is much more popular than Gabriel comparison intervals. It is not as easy to display the results of the Tukey–Kramer method, however. One technique is to find all the sets of groups whose means do not differ significantly from each other, then indicate each set with a different symbol, like this:
Then you explain that "Means with the same letter are not significantly different from each other (Tukey–Kramer test, P<0.05)."
Another way that is used to illustrate the results of the Tukey–Kamer method is with lines connecting means that are not significantly different from each other. This is easiest when the means are sorted from smallest to largest:
|Mean AAM (anterior adductor muscle scar standardized by total shell length) for Mytilus trossulus from five locations. Pairs of means grouped by a horizontal line are not significantly different from each other (Tukey–Kramer method, P>0.05).|
How to do the tests
The one-way anova spreadsheet, described on the anova significance page, calculates Gabriel comparison intervals. The interval it reports is the number that is added to or subtracted from the mean to give the Gabriel comparison limits. The spreadsheet also does the Tukey–Kramer test at the alpha=0.05 level, if you have 20 or fewer groups. The results of the Tukey–Kramer test are shown on the second sheet of the workbook.
I am not aware of any web pages that will calculate either Gabriel comparison intervals or do the Tukey–Kramer test.
To calculate Gabriel comparison limits using SAS, add a MEANS statement to PROC GLM. The first parameter after MEANS is the nominal variable, followed by a forward slash, then CLM and GABRIEL. CLM tells SAS to report the results of the Gabriel method as comparison limits. Here's the SAS program from the one-way anova web page, modified to present Gabriel comparison intervals:
proc glm data=musselshells; class location; model aam = location; means location / clm gabriel; run;
The results are comparison limits. If you are graphing using a spreadsheet, you'll need to calculate the comparison interval, the difference between one of the comparison limits and the mean. For example, the comparison interval for Petersburg is 0.113475−0.103443=0.010032. This is what you put next to the mean on your spreadsheet, and you select it when you tell the spreadsheet what to add and subtract from the mean for the "error bars".
location N Mean 95% Comparison Limits Petersbu 7 0.103443 0.093411 0.113475 Tvarminn 6 0.095700 0.084864 0.106536 Tillamoo 10 0.080200 0.071806 0.088594 Magadan 8 0.078013 0.068628 0.087397 Newport 8 0.074800 0.065416 0.084184
For the Tukey–Kramer technique using SAS, add a MEANS statement to PROC GLM. The first parameter after MEANS is the nominal variable, followed by a forward slash, then LINES and TUKEY. LINES tells SAS to report the results of the Tukey–Kramer method by giving means that are not significantly different the same letter. Here's the SAS program from the one-way anova web page, modified to do the Tukey–Kramer technique:
proc glm data=musselshells; class location; model aam = location; means location / lines tukey; run;
Here's the output:
Means with the same letter are not significantly different. Tukey Grouping Mean N location A 0.103443 7 Petersbu A B A 0.095700 6 Tvarminn B B C 0.080200 10 Tillamoo B C B C 0.078013 8 Magadan C C 0.074800 8 Newport
Sokal and Rohlf, pp. 240-260 (unplanned comparisons in general), 247-249 (Gabriel comparison intervals).
Zar, pp. 208-222.
Gabriel, K.R. 1978. A simple method of multiple comparison of means. J. Amer. Stat. Assoc. 73: 724-729.
This page was last revised September 11, 2009. Its address is http://udel.edu/~mcdonald/statancova.html. It may be cited as pp. 141-145 in:
McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.