With a Model I anova, in addition to testing the overall heterogeneity among the means of more than two groups, it is often desirable to perform additional comparisons of subsets of the means. For example, let's say you have measured the height of the arch of the foot in athletes from nine women's teams: soccer, basketball, rugby, swimming, softball, volleyball, lacrosse, crew and cross-country. You might want to compare the mean of sports that involve a lot of jumping (basketball and volleyball) vs. all other sports. Or you might want to compare swimming vs. all other sports. Or you might want to compare soccer vs. basketball, since they involve similar amounts of running but different amounts of kicking. There are thousands of ways of dividing up nine groups into subsets, and if you do unplanned comparisons, you have to adjust your P-value to a much smaller number to take all the possible tests into account. It is better to plan a small number of interesting comparisons before you collect the data, because then it is not necessary to adjust the P-value for all the tests you didn't plan to do.
It is best if your planned comparisons are orthogonal, because then you do not need to adjust the P-value at all. Orthogonal comparisons are those in which all of the comparisons are independent; you do not compare the same means to each other twice. Doing one comparison of soccer vs. basketball and one of swimming vs. cross-country would be orthogonal, as would soccer vs. basketball and soccer vs. swimming. Jumping sports (basketball and volleyball) vs. non-jumping sports (all others) and rugby vs. lacrosse and softball vs. crew would be three orthogonal comparisons. Jumping sports vs. non-jumping sports and volleyball vs. swimming would not be orthogonal, because the volleyball vs. swimming comparison is included in the jumping vs. non-jumping comparison. Non-ball sports (swimming, crew, cross-country) vs. ball sports and jumping vs. non-jumping would not be orthogonal, because swimming vs. volleyball, among several other pairs, would be included in both comparisons.
The degrees of freedom for each planned comparison is equal to the number of groups, after pooling, minus one. Thus the jumping vs. non-jumping comparison would have one degree of freedom, and non-jumping vs. basketball vs. volleyball would have two degrees of freedom. The maximum total number of degrees of freedom for a set of orthogonal comparisons is the numerator degrees of freedom for the original anova (the original number of groups minus one). You do not need to do a full set of orthogonal comparisons; in this example, you might want to do jumping vs. non-jumping, then stop. Here is an example of a full set of orthogonal comparisons for the sport example; note that the degrees of freedom add up to eight, the number of original groups minus one.
- Jumping (basketball and volleyball) vs. non-jumping sports; 1 d.f.
- Basketball vs. volleyball; 1 d.f.
- Soccer+rugby+lacrosse+softball (ball sports) vs. swimming vs. crew vs. cross-country; 3 d.f.
- Rugby+lacrosse+softball (non-kicking sports) vs. soccer; 1 d.f.
- Rugby vs. lacrosse vs. softball; 2 d.f.
To perform a planned comparison, you simply perform an anova on the pooled data. If you have a spreadsheet with the foot arch data from nine sports teams in nine columns, and you want to do jumping sports vs. non-jumping sports as a planned comparsion, simply copy the volleyball data from one column and paste it at the bottom of the basketball column. Then combine all of the data from the other sports into a single column. The resulting P-value is the correct value to use for the planned comparison.
Sometimes the hypotheses you are interested in make it necessary to do non-orthogonal planned comparisons. For example, you might want to do jumping vs. non-jumping sports, ball sports vs. non-ball sports, swimming vs. crew, and soccer vs. all other sports. In this case, it is necessary to adjust the P-values downward to account for the multiple tests you are doing.
To understand why this is necessary, imagine that you did 100 planned comparisons on the sports data set. Under the null hypothesis that the means were homogeneous, you would expect about 5 of the comparisons to be "significant" at the p<0.05 level. This is what p<0.05 means, after all: 5% of the time you get a "significant" result even though the null hypothesis is true. Clearly it would be a mistake to consider those 5 comparisons that happen to have P<0.05 to be significant rejections of the particular null hypotheses that each comparison tests. Instead you want to use a lower alpha, so the overall probability is less than 0.05 that the set of planned comparisons includes one with a P-value less than the adjusted alpha.
The sequential Dunn–Sidák method is one good way to adjust alpha levels for planned, non-orthogonal comparisons. First, the P-values from the different comparisons are put in order from smallest to largest. If there are k comparisons, the smallest P-value must be less than 1-(1-alpha)1/k to be significant at the alpha level. Thus if there are four comparisons, the smallest P-value must be less than 1-(1-0.05)1/4=0.0127 to be significant at the 0.05 level. If it is not significant, the analysis stops. If the smallest P-value is significant, the next smallest P-value must be less than 1-(1-alpha)1/(k-1), which in this case would be 0.0170. If it is significant, the next P-value must be less than 1-(1-alpha)1/(k-2), and so on until one of the P-values is not significant.
Other techniques for adjusting the alpha are less powerful than the sequential method, but you will often see them in the literature and should therefore be aware of them. The Bonferroni method uses alpha/k as the adjusted alpha level, while the Dunn–Sidák method uses 1-(1-alpha)1/k. The difference between the Bonferroni and Dunn–Sidák adjusted alphas is quite small, so it usually wouldn't matter which you used. For example, the Bonferroni alpha for four comparisons is 0.0125, while the Dunn–Sidák is 0.0127. These are not sequential methods; the same adjusted alpha is used, no matter how many of the comparisons are significant.
Really important note about planned comparisons
Planned comparisons must be planned before you look at the data. If you look at some data, pick out an interesting comparison, then analyze it as if it were a planned comparison, you will be committing scientific fraud. For example, if you look at the mean arch heights for the nine sports, see that cross-country has the lowest mean and swimming has the highest mean, then compare just those two means, your P-value will be much too low. This is because there are 36 possible pairwise comparisons in a set of 9 means. You expect 5 percent, or 1 out of 20, tests to be "significant" at the P<0.05 level, even if all the data really fit the null hypothesis, so there's a good chance that the most extreme comparison in a set of 36 will have a P-value less than 0.05.
It would be acceptable to run a pilot experiment and plan your planned comparisons based on the results of the pilot experiment. However, if you do this you could not include the data from the pilot experiment in the analysis; you would have to limit your anova to the new data.
How to do the tests
To do a planned comparison using the one-way anova spreadsheet, just combine and delete data from the original data set to make a new data set with the comparison you want, then paste it into the anova spreadsheet. If you're moving data around within the anova spreadsheet, use the "copy" and "paste" commands to copy the data to the new destination, followed by "clear" to clear it from the original location; if you use the "cut" and "paste" commands, it will change the references in some of the formulas and mess things up. You might be better off doing your rearranging in a separate spreadsheet, then copying and pasting from there into the anova spreadsheet.
For example, look at the mussel shell data from the previous page. If one of your planned contrasts was "Oregon vs. North Pacific", you'd put the data from Newport, Oregon and Tillamook, Oregon into one column labelled "Oregon," put the data from Petersburg, Alaska and Magadan, Russia in a second column labelled "North Pacific," and delete the Tvarminne data. Putting these two columns of data into the anova spreadsheet gives F1, 31 = 5.31, P = 0.029, so you would conclude that there was a significant difference between the Oregon and North Pacific mussels.
To do non-orthogonal planned comparisons with the sequential Dunn–Sidák method, do each comparison and collect the P-values into a separate spreadsheet. Sort the P-values from smallest to largest, then see which ones meet the sequential Dunn–Sidák criteria described above.
To do a planned comparison using a web page, just clear the web page (there's usually a button marked "Clear" or "Reset") and enter the data for whichever comparison you want to do. You may want to rearrange your data in a spreadsheet, then paste it into the web page.
To do planned comparisons in SAS, the simplest way would be to make a new data set in which you delete the lines you don't want, and give new group names to the lines you want to group together. For the mussel shell data from the previous page, if one of your planned contrasts was "Oregon vs. North Pacific", you could change "Newport" and "Tillamook" to "Oregon," change "Petersburg" and "Magadan" to "North_Pacific," and delete the Tvarminne data, then run PROC GLM on the modified data set. If you're experienced with SAS, you can figure out easier ways to do this, but this will work.
Sokal and Rohlf, pp. 229-242.
This page was last revised August 31, 2009. Its address is http://udel.edu/~mcdonald/statanovaplanned.html. It may be cited as pp. 137-140 in:
McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.