When to use it
You use a nested anova when you have one measurement variable and two or more nominal variables. The nominal variables are nested, meaning that each value of one nominal variable (the subgroups) is found in combination with only one value of the higher-level nominal variable (the groups). The top-level nominal variable may be either Model I or Model II, but the lower-level nominal variables must all be Model II.
Nested analysis of variance is an extension of one-way anova in which each group is divided into subgroups. In theory, these subgroups are chosen randomly from a larger set of possible subgroups. For example, let's say you are testing the null hypothesis that stressed and unstressed rats have the same glycogen content in their gastrocnemius muscle. If you had one cage containing several stressed rats, another cage containing several unstressed rats, and one glycogen measurement from each rat, you would analyze the data using a one-way anova. However, you wouldn't know whether a difference in glycogen levels was due to the difference in stress, or some other difference between the cages--maybe the cage containing stressed rats gets more food, or is warmer, or happens to contain a mean rat who uses rat mind-control techniques to enslave all the other rats in the cage and get them to attack humans.
If, however, you had several cages of stressed rats and several cages of unstressed rats, with several rats in each cage, you could tell how much variation was among cages and how much was between stressed and unstressed. The groups would be stressed vs. unstressed, and each cage of several rats would be a subgroup; each glycogen level of a rat would be one observation within a subgroup.
The above is an example of a two-level nested anova; one level is the groups, stressed vs. unstressed, while another level is the subgroups, the different cages. If you worry about the accuracy of your glycogen assay, you might make multiple assays on each rat. In that case you would have a three-level nested anova, with groups (stressed vs. unstressed), subgroups (cages), and subsubgroups (the set of observations on each rat would be a subsubgroup). You can have more levels, too.
Note that if the subgroups, subsubgroups, etc. are distinctions with some interest (Model I), rather than random, you should not use a nested anova. For example, you might want to divide the stressed rats into male and female subgroups, and the same for the unstressed rats. Male and female are not distinctions without interest; you would be interested to know that one sex had higher glycogen levels than the other. In this case you would use a two-way anova to analyze the data, rather than a nested anova.
Sometimes the distinction can be subtle. For example, let's say you measured the glycogen content of the right gastrocnemius muscle and left gastrocnemius muscle from each rat. If you think there might be a consistent right vs. left difference, you would use a two-way anova to analyze right vs. left and stressed vs. unstressed. If, however, you think that any difference between the two muscles of an individual rat is due to random variation in your assay technique, not a real difference between right and left, you could use a nested anova, with muscles as one level. Think of it this way: if you dissected out the muscles, labeled the tubes "A" and "B," then forgot which was right and which was left, it wouldn't matter if you were doing a nested anova; it would be a disaster if you were doing a two-way anova.
A nested anova has one null hypothesis for each level. In a two-level nested anova, one null hypothesis would be that the subgroups within each group have the same means; the second null hypothesis would be that the groups have the same means.
How the test works
Remember that in a one-way anova, the test statistic, Fs, is the ratio of two mean squares: the mean square among groups divided by the mean square within groups. If the variation among groups (the group mean square) is high relative to the variation within groups, the test statistic is large and therefore unlikely to occur by chance. In a two-level nested anova, there are two F statistics, one for subgroups (Fsubgroup) and one for groups (Fgroup). The subgroup F-statistic is found by dividing the among-subgroup mean square, MSsubgroup (the average variance of subgroup means within each group) by the within-subgroup mean square, MSwithin (the average variation among individual measurements within each subgroup). The group F-statistic is found by dividing the among-group mean square, MSgroup (the variation among group means) by MSsubgroup. The P-value is then calculated for the F-statistic at each level.
For a nested anova with three or more levels, the F-statistic at each level is calculated by dividing the MS at that level by the MS at the level immediately below it.
If the subgroup F-statistic is not significant, it is possible to calculate the group F-statistic by dividing MSgroup by MSpooled, a combination of MSsubgroup and MSwithin. The conditions under which this is acceptable are complicated, and some statisticians think you should never do it; for simplicity, I suggest always using MSgroup / MSsubgroup to calculate Fgroup.
In addition to testing the equality of the means at each level, a nested anova also partitions the variance into different levels. This can be a great help in designing future experiments. For example, let's say you did a four-level nested anova with stressed vs. unstressed as groups, cages as subgroups, individual rats as subsubgroups, and the two gastrocnemius muscles as subsubsubgroups, with multiple glycogen assays per muscle. If most of the variation is among rats, with relatively little variation among muscles or among assays on each muscle, you might want to do just one assay per rat and use a lot more rats in your next experiment. This would give you greater statistical power than taking repeated measurements on a smaller number of rats. If the nested anova tells you there is variation among cages, you would either want to use more cages or try to control whatever variable is causing the cages to differ in the glycogen content of their rats; maybe the exercise wheel is broken in some of the cages, or maybe some cages have more rats than others. If you had an estimate of the relative cost of different parts of the experiment, such as keeping more rats vs. doing more muscle preps, formulas are available to help you design the most statistically powerful experiment for a given amount of money; see Sokal and Rohlf, pp. 309-317.
Mixed-model vs. pure Model II nested anova
All of the subgroups, subsubgroups, etc. in a nested anova should be based on distinctions of no inherent interest, of the kind analyzed with a Model II one-way anova. The groups at the top level may also be of no inherent interest, in which case it is a pure Model II nested anova. This often occurs in quantitative genetics. For example, if you are interested in estimating the heritability of ammonia content in chicken manure, you might have several roosters, each with several broods of chicks by different hens, with each chick having several ammonia assays of its feces. The offspring of each rooster would be the groups, the offspring of each hen would be the subgroups, and the set of ammonia assays on each chick would be subsubgroups. This would be a pure Model II anova, because you would want to know what proportion of the total variation in ammonia content was due to variation among roosters, as a way of estimating heritability; you wouldn't be interested in which rooster had offspring with the lowest or highest ammonia content in their feces. In a pure model II nested anova, partitioning the variance is of primary importance.
If the top-level groups are of inherent interest, of the kind analyzed with a Model I one-way anova, then it is a mixed-model nested anova. The stressed vs. unstressed rat example is a mixed-model anova, because stressed vs. unstressed is what you are interested in. The ammonia in chicken feces example could also be analyzed using a mixed-model nested anova, if you were really interested in knowing which rooster had offspring with the lowest ammonia in their feces. This might be the case if you were going to use the best rooster to sire the next generation of chickens at your farm. In a mixed-model nested anova, partitioning the variance is of less interest than the significance test of the null hypothesis that the top-level groups have the same mean. You can then do planned comparisons among the top-level means, just as you would for a one-way anova, or the Tukey-Kramer test or Gabriel comparison intervals for unplanned comparisons. Even in a mixed model nested anova, partitioning the variance may help you design better experiments by revealing which level needs to be controlled better or replicated more.
Unequal sample sizes
When the sample sizes in a nested anova are unequal, the P-values corresponding to the F-statistics may not be very good estimates of the actual probability. For this reason, you should try to design your experiments with a "balanced" design, equal sample sizes in each subgroup. Often this is impractical; if you do have unequal sample sizes, you may be able to get a better estimate of the correct P-value by using modified mean squares at each level, found using a correction formula called the Satterthwaite approximation. Under some situations, however, the Satterthwaite approximation will make the P-values less accurate. If the Satterthwaite approximation cannot be used, the P-values will be conservative (less likely to be significant than they ought to be). Note that the Satterthwaite approximation results in fractional degrees of freedom, such as 2.87.
Keon and Muir (2002) wanted to know whether habitat type affected the growth rate of the lichen Usnea longissima. They weighed and transplanted 30 individuals into each of 12 sites in Oregon. The 12 sites were grouped into 4 habitat types, with 3 sites in each habitat. One year later, they collected the lichens, weighed them again, and calculated the change in weight. There are two nominal variables (site and habitat type), with sites nested within habitat type. One could analyze the data using two measurement variables, beginning weight and ending weight, but because the lichen individuals were chosen to have similar beginning weights, it makes more sense to use the change in weight as a single measurement variable. The results of a mixed-model nested anova are that there is significant variation among sites within habitats (F8, 200=8.11, P=1.8 x 10-9) and significant variation among habitats (F3, 8=8.29, P=0.008). When the Satterthwaite approximation is used, the test of the effect of habitat is only slightly different (F3, 8.13=8.76, P=0.006)
Students in my section of Advanced Genetics Lab collected data on the codon bias index (CBI), a measure of the nonrandom use of synonymous codons from genes in Drosophila melanogaster. The groups are three chromosomes, and the subgroups are small regions within each chromosome. Each observation is the CBI value for a single gene in that chromosome region, and there were several genes per region. The data are shown below in the SAS program.
The results of the nested anova were F3, 30=6.92, P=0.001 for subgroups and F2, 3=0.10, P=0.91 for groups, without the Satterthwaite correction; using the correction changes the results only slightly. The among-subgroup variation is 49.9% of the total, while the among-group variation is 0%. The conclusion is that there is a lot of variation in CBI among different regions within a chromosome, so in order to see whether there is any difference among the chromosomes, it will be necessary to sample a lot more regions on each chromosome. Since 50.1% of the variance is among genes within regions, it will be necessary to sample several genes within each region, too.
Graphing the results
The way you graph the results of a nested anova depends on the outcome and your biological question. If the variation among subgroups is not significant and the variation among groups is significant—you're really just interested in the groups, and you used a nested anova to see if it was okay to combine subgroups—you might just plot the group means on a bar graph, as shown for one-way anova. If the variation among subgroups is interesting, you can plot the means for each subgroup, with different patterns or colors indicating the different groups. Here's an example for the codon bias data:
|Graph of mean codon bias index in different regions of Drosophila melanogaster chromosomes. Solid black bars are regions in chromosome 2, gray bars are chromosome 3, and empty bars are the X chromosome.|
Both nested anova and two-way anova (and higher level anovas) have one measurement variable and more than one nominal variable. The difference is that in a two-way anova, the values of each nominal variable are found in all combinations with the other nominal variable; in a nested anova, each value of one nominal variable (the subgroups) is found in combination with only one value of the other nominal variable (the groups).
There doesn't seem to have been a lot of work done on non-parametric alternatives to nested anova. You could convert the measurement variable to ranks (replace each observation with its rank over the entire data set), then do a nested anova on the ranks; see Conover and Iman (1981).
How to do the test
I have made an spreadsheet to do a two-level nested anova, with equal or unequal sample sizes, on up to 50 subgroups with up to 1000 observations per subgroup. It does significance tests and partitions the variance. The spreadsheet tells you whether the Satterthwaite approximation is appropriate, using the rules on p. 298 of Sokal and Rohlf (1983), and gives you the option to use it. Fgroup is calculated as MSgroup/MSsubgroup. The spreadsheet gives the variance components as percentages of the total. If the estimate of the group component would be negative (which can happen in unbalanced designs), it is set to zero.
Rweb lets you do nested anovas. To use it, choose "ANOVA" from the Analysis Menu and choose "External Data: Use an option below" from the Data Set Menu, then either select a file to analyze or enter your data in the box. On the next page (after clicking on "Submit"), select the two nominal variables under "Choose the Factors" and select the measurement variable under "Choose the response." Fgroup is calculated as MSgroup/MSwithin, which is not a good idea if Fsubgroup is significant. Rweb does not partition the variance.
PROC GLM will handle both balanced and unbalanced designs. List all the nominal variables in the CLASS statement. In the MODEL statement, give the name of the measurement variable, then after the equals sign give the name of the group variable, then the name of the subgroup variable followed by the group variable in parentheses, etc. The TEST statement tells it to calculate the F-statistic for groups by dividing the group mean square by the subgroup mean square, instead of the within-group mean square ("h" stands for "hypothesis" and "e" stands for "error"). "htype=1 etype=1" tells SAS to use "type I sums of squares"; I couldn't tell you the difference between them and types II, III and IV, but I'm pretty sure that type I is appropriate for a nested anova.
Here is an example using data on the codon bias index (CBI), a measure of the nonrandom use of synonymous codons. The groups are two chromosomes in Drosophila melanogaster, and the subgroups are small regions within each chromosome. Each observation is the CBI value for a single gene in that chromosome region.
data flies; input gene $ chromosome $ region $ cbi; cards; singed X 7D 0.366
====See the web page for the full data set====
sepia 3 66D 0.245 ; proc glm data=flies; class chromosome region; model cbi=chromosome region(chromosome) / ss1; test h=chromosome e=region(chromosome) / htype=1 etype=1; run;
The output includes Fgroup calculated two ways, as MSgroup/MSwithin and as MSgroup/MSsubgroup.
Source DF Type I SS Mean Sq. F Value Pr > F chromosome 2 0.01032594 0.00516297 0.66 0.5255 MSgroup/MSwithin region(chromosome) 3 0.16294606 0.05431535 6.92 0.0011 Tests of Hypotheses Using the Type I MS for region(chromosome) as an Error Term Source DF Type I SS Mean Sq. F Value Pr > F chromosome 2 0.01032594 0.00516297 0.10 0.9120 MSgroup/MSsubgroup
To do the Tukey-Kramer test or Gabriel comparison intervals, add a MEANS statement, as shown here:
proc glm data=flies; class chromosome region; model cbi=chromosome region(chromosome) / ss1; test h=chromosome e=region(chromosome) / htype=1 etype=1; means chromosome /lines tukey; means chromosome /clm gabriel; run;
Here is the output from these two MEANS statements when applied to the example data set:
Tukey's Studentized Range (HSD) Test for cbi Means with the same letter are not significantly different. Tukey Grouping Mean N chromosome A 0.21975 12 3 A A 0.19330 10 X A A 0.18014 14 2 . . . Gabriel's Comparison Intervals for cbi 95% Comparison chromosome N Mean Limits 3 12 0.21975 0.17412 0.26538 X 10 0.19330 0.14331 0.24329 2 14 0.18014 0.13790 0.22239
PROC GLM does not partition the variance. There is a PROC NESTED that will partition the variance, but it only does the hypothesis testing for a balanced nested anova, so if you have an unbalanced design you'll want to run both PROC GLM and PROC NESTED. PROC NESTED requires that the data be sorted by groups and subgroups. To use it on the above data, add the following to the end of the above SAS program. PROC SORT sorts the data by the first variable in the BY statement, then by the second variable. In PROC NESTED, the group is given first in the CLASS statement, then the subgroup.
proc sort data=flies; by chromosome region; proc nested data=flies; class chromosome region; var cbi; run;
As you can see, PROC NESTED didn't calculate the F-statistics or P-values, since the fly data are unbalanced.
Variance Sum of F Error Mean Variance Percent Source DF Squares Value Pr>F Term Square Component of Total Total 35 0.408908 0.011683 0.015670 100.0000 chromosome 2 0.010326 0.005163 -0.004171 0.0000 region 3 0.162946 0.054315 0.007816 49.8765 Error 30 0.235636 0.007855 0.007855 50.1235
Sokal and Rohlf, pp. 272-308.
Zar, pp. 303-311.
Conover, W.J., and R.L. Iman. 1981. Rank transformations as a bridge between parametric and nonparametric statistics. Am. Statistician 35: 124-129.
Keon, D.B., and P.S. Muir. 2002. Growth of Usnea longissima across a variety of habitats in the Oregon coast range. Bryologist 105: 233-242.
This page was last revised September 12, 2009. Its address is http://udel.edu/~mcdonald/statnested.html. It may be cited as pp. 173-181 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.