Tests for nominal variables

Descriptive statistics

Tests for one measurement variable

Tests for multiple measurement variables

Multiple tests


Two-way anova

When to use it

You use a two-way anova (also known as a factorial anova, with two factors) when you have one measurement variable and two nominal variables. The nominal variables (often called "factors" or "main effects") are found in all possible combinations. For example, let's say you are testing the null hypothesis that stressed and unstressed rats have the same glycogen content in their gastrocnemius muscle, and you are worried that there might be sex-related differences in glycogen content as well. The two factors are stress level (stressed vs. unstressed) and sex (male vs. female). Unlike a nested anova, each grouping extends across the other grouping. In a nested anova, you might have "cage 1" and "cage 2" nested entirely within the stressed group, while "cage 3" and "cage 4" were nested within the unstressed group. In a two-way anova, the stressed group contains both male and female rats, and the unstressed group also contains both male and female rats. The factors used to group the observations may both be model I, may both be model II, or may be one of each ("mixed model").

A two-way anova may be done with replication (more than one observation for each combination of the nominal variables) or without replication (only one observation for each combination of the nominal variables).


Two-way anova, like all anovas, assumes that the observations within each cell are normally distributed and have equal variances.

Two-way anova with replication

Null hypotheses: The results of a two-way anova with replication include tests of three null hypotheses: that the means of observations grouped by one factor are the same; that the means of observations grouped by the other factor are the same; and that there is no interaction between the two factors. The interaction test tells you whether the effects of one factor depend on the other factor. In the rat example, imagine that stressed and unstressed female rats have about the same glycogen level, while stressed male rats had much lower glycogen levels than unstressed male rats. The different effects of stress on female and male rats would result in a significant interaction term in the anova. When the interaction term is significant, the usual advice is that you should not test the effects of the individual factors. In this example, it would be misleading to examine the individual factors and conclude "Stressed rats have lower glycogen than unstressed," when that is only true for male rats, or "Male rats have lower glycogen than female rats," when that is only true when they are stressed.

What you can do, if the interaction term is significant, is look at each factor separately, using a one-way anova. In the rat example, you might be able to say that for female rats, the mean glycogen levels for stressed and unstressed rats are not significantly different, while for male rats, stresed rats have a significantly lower mean glycogen level than unstressed rats. Or, if you're more interested in the sex difference, you might say that male rats have a significantly lower mean glycogen level than female rats under stress conditions, while the mean glycogen levels do not differ significantly under unstressed conditions.

How the test works: When the sample sizes in each subgroup are equal (a "balanced design"), the mean square is calculated for each of the two factors (the "main effects"), for the interaction, and for the variation within each combination of factors. Each F-statistic is found by dividing a mean square by the within-subgroup mean square.

When the sample sizes for the subgroups are not equal (an "unbalanced design"), the analysis is much more complicated, and there are several different techniques for testing the main and interaction effects. The details of this are beyond the scope of this handbook. If you're doing a two-way anova, your statistical life will be a lot easier if you make it a balanced design.

Two-way anova without replication

Null hypotheses: When there is only a single observation for each combination of the nominal variables, there are only two null hypotheses: that the means of observations grouped by one factor are the same, and that the means of observations grouped by the other factor are the same. It is impossible to test the null hypothesis of no interaction. Testing the two null hypotheses about the main effects requires assuming that there is no interaction.

How the test works: The mean square is calculated for each of the two main effects, and a total mean square is also calculated by considering all of the observations as a single group. The remainder mean square (also called the discrepance or error mean square) is found by subtracting the two main effect mean squares from the total mean square. The F-statistic for a main effect is the main effect mean square divided by the remainder mean square.

Repeated measures: One experimental design that is analyzed by a two-way anova is repeated measures, where an observation has been made on the same individual more than once. This usually involves measurements taken at different time points. For example, you might measure running speed before, one week into, and three weeks into a program of exercise. Because individuals would start with different running speeds, it is better to analyze using a two-way anova, with "individual" as one of the factors, rather than lumping everyone together and analyzing with a one-way anova. Sometimes the repeated measures are repeated at different places rather than different times, such as the hip abduction angle measured on the right and left hip of individuals. Repeated measures experiments are often done without replication, although they could be done with replication.

In a repeated measures design, one of main effects is usually uninteresting and the test of its null hypothesis may not be reported. If the goal is to determine whether a particular exercise program affects running speed, there would be little point in testing whether individuals differed from each other in their average running speed; only the change in running speed over time would be of interest.

Randomized blocks: Another experimental design that is analyzed by a two-way anova is randomized blocks. This often occurs in agriculture, where you may want to test different treatments on small plots within larger blocks of land. Because the larger blocks may differ in some way that may affect the measurement variable, the data are analyzed with a two-way anova, with the block as one of the nominal variables. Each treatment is applied to one or more plot within the larger block, and the positions of the treatments are assigned at random. This is most commonly done without replication (one plot per block), but it can be done with replication as well.


Sweetpotato weevil
The West Indian sweetpotato weevil, Euscepes postfasciatus.

Shimoji and Miyatake (2002) raised the West Indian sweetpotato weevil for 14 generations on an artificial diet. They compared these artificial diet weevils (AD strain) with weevils raised on sweet potato roots (SP strain), the weevil's natural food. Multiple females of each strain were placed on either the artificial diet or sweet potato root, and the number of eggs each female laid over a 28-day period was counted. There are two nominal variables, the strain of weevil (AD or SP) and the oviposition test food (artificial diet or sweet potato), and one measurement variable (the number of eggs laid).

Graph of eggs laid by weevils
Mean total numbers of eggs of females from the SP strain (gray bars) and AD strain (white bars). Values are mean ±SEM. (Adapted from Fig. 4 of Shimoji and Miyatake [2002]).

The results of the two-way anova with replication include a significant interaction term (F1, 117=17.02, P=7 x 10-5). Looking at the graph, the interaction can be interpreted this way: on the sweet potato diet, the SP strain laid more eggs than the AD strain; on the artificial diet, the AD strain laid more eggs than the SP strain. Each main effect is also significant: weevil strain (F1, 117=8.82, P=0.0036) and oviposition test food (F=1, 117=345.92, P=9 x 10-37). However, the significant effect of strain is a bit misleading, as the direction of the difference between strains depends on which food they ate. This is why it is important to look at the interaction term first.

I assayed the activity of the enzyme mannose-6-phosphate isomerase (MPI) in the amphipod crustacean Platorchestia platensis (McDonald, unpublished data). There are three genotypes at the locus for MPI, Mpiff, Mpifs, and Mpiss, and I wanted to know whether the genotypes had different activity. Because I didn't know whether sex would affect activity, I also recorded the sex. Each amphipod was lyophilized, weighed, and homogenized; then MPI activity of the soluble portion was assayed. The data (in ΔO.D. units/sec/mg dry weight) are shown below as part of the SAS example. The results indicate that the interaction term, the effect of sex and the effect of genotype are all non-significant.

Place and Abramson (2008) put diamondback rattlesnakes (Crotalus atrox) in a "rattlebox," a box with a lid that would slide open and shut every 5 minutes. At first, the snake would rattle its tail each time the box opened. After a while, the snake would become habituated to the box opening and stop rattling its tail. They counted the number of box openings until a snake stopped rattling; fewer box openings means the snake was more quickly habituated. They repeated this experiment on each snake on four successive days. Place and Abramson (2008) used 10 snakes, but some of them never became habituated; to simplify this example, I'll use data from the 6 snakes that did become habituated on each day:

Snake ID   Day  Trials to habituation
   D1       1      85
            2      58
            3      15
            4      57

   D3       1      107
            2      51
            3      30
            4      12

   D5       1      61
            2      60
            3      68
            4      36

   D8       1      22
            2      41
            3      63
            4      21

   D11      1      40
            2      45
            3      28
            4      10

   D12      1      65
            2      27
            3      3
            4      16 

Graph of rattlesnake habituation
Mean number of trials before rattlesnakes stopped rattling, on four successive days. Values are mean ±95% confidence intervals. Data from Place and Abramson (2008).

The measurement variable is trials to habituation, and the two nominal variables are day (1 to 4) and snake ID. This is a repeated measures design, as the measurement variable is measured repeatedly on each snake. It is analyzed using a two-way anova without replication. The effect of snake is not significant (F5, 15=1.24, P=0.34), while the effect of day is significant (F3, 15=3.32, P=0.049).

Graphing the results

Ugly 3-dimensional graph
Don't use this kind of graph. Which bar is higher: fs in females or ss in males?
Ugly 3-dimensional graph
Don't use this kind of graph. Which bar is higher: fs in females or ss in males?

Sometimes the results of a two-way anova are plotted on a 3-D graph, with the measurement variable on the Y-axis, one nominal variable on the X-axis, and the other nominal variable on the Z-axis (going into the paper). This makes it difficult to visually compare the heights of the bars in the front and back rows, so I don't recommend this. Instead, I suggest you plot a bar graph with the bars clustered by one nominal variable, with the other nominal variable identified using the color or pattern of the bars.

Graph of MPI activity in amphipods
Mannose-6-phosphate isomerase activity in three MPI genotypes in the amphipod crustacean Platorchestia platensis. Isn't this graph much better?

If one of the nominal variables is the interesting one, and the other is just a possible confounder, I'd group the bars by the possible confounder and use different patterns for the interesting variable. For the amphipod data described above, I was interested in seeing whether MPI phenotype affected enzyme activity, with any difference between males and females as an annoying confounder, so I group the bars by sex.

Similar tests

A two-way anova without replication and only two values for the interesting nominal variable may be analyzed using a paired t-test. The results of a paired t-test are mathematically identical to those of a two-way anova, but the paired t-test is easier to do. Data sets with one measurement variable and two nominal variables, with one nominal variable nested under the other, are analyzed with a nested anova.

Data in which the measurement variable is severely non-normal or heteroscedastic may be analyzed using the non-parametric Friedman's method (for a two-way design without replication) or the Scheirer–Ray–Hare technique (for a two-way design with replication). See Sokal and Rohlf (1995), pp. 440-447. I don't know how to tell whether the non-normality or heteroscedasticity in your data are so bad that a two-way anova would be inappropriate.

Three-way and higher order anovas are possible, as are anovas combining aspects of a nested and a two-way or higher order anova. The number of interaction terms increases rapidly as designs get more complicated, and the interpretation of any significant interactions can be quite difficult. It is better, when possible, to design your experiments so that as many factors as possible are controlled, rather than collecting a hodgepodge of data and hoping that a sophisticated statistical analysis can make some sense of it.

How to do the test


I haven't put together a spreadsheet to do two-way anovas.

Web pages

Web pages are available to perform: a 2x2 two-way anova with replication, up to a 6x4 two-way anova without replication or with up to 4 replicates.

Rweb lets you do two-way anovas with or without replication. To use it, choose "ANOVA" from the Analysis Menu and choose "External Data: Use an option below" from the Data Set Menu, then either select a file to analyze or enter your data in the box. On the next page (after clicking on "Submit"), select the two nominal variables under "Choose the Factors" and select the measurement variable under "Choose the response."


Use PROC GLM for a two-way anova. Here is an example using the MPI activity data described above:

data amphipods;
   input ID $ sex $ genotype $ activity;
1       male     ff   1.884

====See the web page for the full data set====

2       male     ff   2.283
3       male     fs   2.396
4       female   ff   2.838
5       male     fs   2.956
6       female   ff   4.216
7       female   ss   3.620
8       female   ff   2.889
9       female   fs   3.550
10      male     fs   3.105
11      female   fs   4.556
12      female   fs   3.087
13      male     ff   4.939
14      male     ff   3.486
15      female   ss   3.079
16      male     fs   2.649
17      female   fs   1.943
19      female   ff   4.198
20      female   ff   2.473
22      female   ff   2.033
24      female   fs   2.200
25      female   fs   2.157
26      male     ss   2.801
28      male     ss   3.421
29      female   ff   1.811
30      female   fs   4.281
32      female   fs   4.772
34      female   ss   3.586
36      female   ff   3.944
38      female   ss   2.669
39      female   ss   3.050
41      male     ss   4.275
43      female   ss   2.963
46      female   ss   3.236
48      female   ss   3.673
49      male     ss   3.110
proc glm data=amphipods;
  class sex genotype;
  model activity=sex genotype sex*genotype;

The results indicate that the interaction term is not significant (P=0.60), the effect of genotype is not significant (P=0.84), and the effect of sex concentration not significant (P=0.77).

Source        DF     Type I SS    Mean Square   F Value    Pr > F

sex            1    0.06808050     0.06808050      0.09    0.7712
genotype       2    0.27724017     0.13862008      0.18    0.8400
sex*genotype   2    0.81464133     0.40732067      0.52    0.6025

If you are using SAS to do a two-way anova without replication, do not put an interaction term in the model statement (sex*genotype is the interaction term in the example above).

Further reading

Sokal and Rohlf, pp. 321-342.

Zar, pp. 231-271.


Picture of a weevil from Okinawa Prefectural Fruit Fly Eradication Office.

Place, A.J., and C.I. Abramson. 2008. Habituation of the rattle response in western diamondback rattlesnakes, Crotalus atrox. Copeia 2008: 835-843.

Shimoji, Y., and T. Miyatake. 2002. Adaptation to artificial rearing during successive generations in the West Indian sweetpotato weevil, Euscepes postfasciatus (Coleoptera: Curuculionidae). Annals of the Entomological Society of America 95: 735-739.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

Alternate terms: Two-way analysis of variance, factorial analysis of variance.

This page was last revised September 14, 2009. Its address is It may be cited as pp. 182-190 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.