# Analysis of covariance

### When to use it

Analysis of covariance (ancova) is used when you have two measurement variables and two nominal variables. One of the nominal variables groups is the "hidden" nominal variable that groups the measurement observations into pairs, and the other nominal variable divides the regressions into two or more sets.

The purpose of ancova to compare two or more linear regression lines. It is a way of comparing the Y variable among groups while statistically controlling for variation in Y caused by variation in the X variable. For example, let's say you want to know whether the Cope's gray treefrog, Hyla chrysoscelis, has a different calling rate than the eastern gray treefrog, Hyla versicolor, which has twice as many chromosomes as H. chrysoscelis but is morphologically identical. As shown on the regression web page, the calling rate of eastern gray treefrogs is correlated with temperature, so you need to control for that. One way to control for temperature would be to bring the two species of frogs into a lab and keep them all at the same temperature, but you'd have to worry about whether their behavior in an artificial lab environment was really the same as in nature. In addition, you'd want to know whether one species had a higher calling rate at some temperatures, while the other species had a higher calling rate at other temperatures. It might be better to measure the calling rate of each species of frog at a variety of temperatures in nature, then use ancova to see whether the regression line of calling rate on temperature is significantly different between the two species.

### Null hypotheses

Two null hypotheses are tested in an ancova. The first is that the slopes of the regression lines are all the same. If this hypothesis is not rejected, the second null hypothesis is tested: that the Y-intercepts of the regression lines are all the same.

Although the most common use of ancova is for comparing two regression lines, it is possible to compare three or more regressions. If their slopes are all the same, it is then possible to do planned or unplanned comparisions of Y-intercepts, similar to the planned or unplanned comparisons of means in an anova. I won't cover that here.

### How it works

The first step in performing an ancova is to compute each regression line. In the frog example, there are two values of the species nominal variable, Hyla chrysoscelis and H. versicolor, so the regression line is calculated for calling rate vs. temperature for each species of frog.

Next, the slopes of the regression lines are compared; the null hypothesis that the slopes are the same is tested. The final step of the anova, comparing the Y-intercepts, cannot be performed if the slopes are significantly different from each other. If the slopes of the regression lines are different, the lines cross each other somewhere, and one group has higher Y values in one part of the graph and lower Y values in another part of the graph. (If the slopes are different, there are techniques for testing the null hypothesis that the regression lines have the same Y-value for a particular X-value, but they're not used very often and I won't consider them here.)

If the slopes are significantly different, the ancova is done, and all you can say is that the slopes are significantly different. If the slopes are not significantly different, the next step in an ancova is to draw a regression line through each group of points, all with the same slope. This common slope is a weighted average of the slopes of the different groups.

The final test in the ancova is to test the null hypothesis that all of the Y-intercepts of the regression lines with a common slope are the same. Because the lines are parallel, saying that they are significantly different at one point (the Y-intercept) means that the lines are different at any point.

### Examples

 Eggs laid vs. female weight in the firefly Photinus ignitus.. Filled circles are females that have mated with three males; open circles are females that have mated with one male.

In the firefly species Photinus ignitus, the male transfers a large spermatophore to the female during mating. Rooney and Lewis (2002) wanted to know whether the extra resources from this "nuptial gift" enable the female to produce more offspring. They collected 40 virgin females and mated 20 of them to one male and 20 to three males. They then counted the number of eggs each female laid. Because fecundity varies with the size of the female, they analyzed the data using ancova, with female weight (before mating) as the independent measurement variable and number of eggs laid as the dependent measurement variable. Because the number of males has only two values ("one" or "three"), it is a nominal variable, not measurement.

The slopes of the two regression lines (one for single-mated females and one for triple-mated females) are not significantly different (F1, 36=1.1, P=0.30). The Y-intercepts are significantly different (F1, 36=8.8, P=0.005); females that have mated three times have significantly more offspring than females mated once.

 Skeleton of an American alligator.

Paleontologists would like to be able to determine the sex of dinosaurs from their fossilized bones. To see whether this is feasible, Prieto-Marquez et al. (2007) measured several characters that are thought to distinguish the sexes in alligators (Alligator mississipiensis), which are among the closest living relatives of dinosaurs. One of the characters was pelvic canal width, which they wanted to standardize using snout-vent length. The raw data are shown in the SAS example below.

The slopes of the regression lines are not significantly different (P=0.9101). The Y-intercepts are significantly different (P=0.0267), indicating that male alligators of a given length have a significantly greater pelvic canal width. However, inspection of the graph shows that there is a lot of overlap between the sexes even after standardizing for sex, so it would not be possible to reliably determine the sex of a single individual with this character alone.

 Pelvic canal width vs. snout-vent length in the American alligator. Blue circles and line are males; pink X's and line are females.
 Pelvic canal width vs. snout-vent length in the American alligator. Circles and solid line are males; X's and dashed line are females.

### Graphing the results

Data for an ancova are shown on a scattergraph, with the independent variable on the X-axis and the dependent variable on the Y-axis. A different symbol is used for each value of the nominal variable, as in the firefly graph above, where filled circles are used for the thrice-mated females and open circles are used for the once-mated females. To get this kind of graph in a spreadsheet, you would put all of the X-values in column A, one set of Y-values in column B, the next set of Y-values in column C, and so on.

Most people plot the individual regression lines for each set of points, as shown in the firefly graph, even if the slopes are not significantly different. This lets people see how similar or different the slopes look. This is easy to do in a spreadsheet; just click on one of the symbols and choose "Add Trendline" from the Chart menu.

### Similar tests

One alternative technique that is sometimes possible is to take the ratio of the two measurement variables, then use a one-way anova. For the mussel example I used for testing the homogeneity of means in one-way anova, I standardized the length of the anterior adductor muscle by dividing by the total length. There are technical problems with doing statistics on ratios of two measurement variables (the ratio of two normally distributed variables is not normally distributed), but if you can safely assume that the regression lines all pass through the origin (in this case, that a mussel that was 0 mm long would have an AAM length of 0 mm), this is not an unreasonable thing to do, and it simplifies the statistics. It would be important to graph the association between the variables and analyze it with linear regression to make sure that the relationship is linear and does pass through the origin.

Sometimes the two measurement variables are just the same variable measured at different times or places. For example, if you measured the weights of two groups of individuals, put some on a new weight-loss diet and the others on a control diet, then weighed them again a year later, you could treat the difference between final and initial weights as a single variable, and compare the mean weight loss for the control group to the mean weight loss of the diet group using a one-way anova. The alternative would be to treat final and initial weights as two different variables and analyze using an ancova: you would compare the regression line of final weight vs. initial weight for the control group to the regression line for the diet group. The anova would be simpler, and probably perfectly adequate; the ancova might be better, particularly if you had a wide range of initial weights, because it would allow you to see whether the change in weight depended on the initial weight.

One nonparametric alternative to ancova is to convert the measurement variables to ranks, then do a regular ancova on the ranks; see Conover and Iman (1982) for the details. There are several other versions of nonparametric ancova, but they appear to be less popular, and I don't know the advantages and disadvantages of each.

### How to do the test

#### Spreadsheet and web pages

Richard Lowry has made web pages that allow you to perform ancova with two, three or four groups, and a downloadable spreadsheet for ancova with more than four groups. You may cut and paste data from a spreadsheet to the web pages. In the results, the P-value for "adjusted means" is the P-value for the difference in the intercepts among the regression lines; the P-value for "between regressions" is the P-value for the difference in slopes. One bug in the web pages is that very small values of P are not represented correctly. If the web page gives you a strange P-value (negative, greater than 1, "5e-7"), use the FDIST function of a spreadsheet along with the F value and degrees of freedom from the web page to calculate the correct P value. For example, if the F-value for the adjusted means is 281.37, the d.f. for the adjusted means is 1 and the d.f. for the adjusted error is 84, go to a spreadsheet and enter "=FDIST(281.37, 1, 84)" to get the correct P-value. To get the P-value for the slopes, use the d.f. for "between regressions" and "remainder."

#### SAS

Here's an illustration of how to do analysis of covariance in SAS, using the data from Prieto-Marquez et al. (2007) on snout-vent length and pelvic canal width in alligators:

```
data gators;
input sex \$ snoutvent pelvicwidth;
cards;
male      1.10    7.62```

====See the web page for the full data set====

```male      1.19    8.20
male      1.13    8.00
male      1.15    9.60
male      0.96    6.50
male      1.19    8.17
male      1.06    7.20
male      0.70    4.65
male      0.70    5.04
male      1.04    8.83
male      1.15    8.01
male      1.10    6.84
male      1.15    8.37
male      1.15    7.36
male      0.91    6.43
male      1.45    9.43
male      1.22    7.70
male      1.33   10.20
male      1.38    9.14
female    1.24    7.64
female    1.02    6.31
female    0.93    5.90
female    0.71    4.48
female    1.03    6.03
female    1.02    6.60
female    0.95    5.88
female    1.03    6.77
female    0.96    6.47
female    1.16    7.56
female    0.93    6.13
female    1.04    6.76
female    1.03    6.63
female    0.93    5.93
female    0.85    6.52```
```female    1.23    9.23
;
proc glm data=gators;
class sex;
model pelvicwidth=snoutvent sex snoutvent*sex;
proc glm data=gators;
class sex;
model pelvicwidth=snoutvent sex;
run;

```

The first time you run PROC GLM, the MODEL statement includes the interaction term (SNOUTVENT*SEX). This tests whether the slopes of the regression lines are significantly different:

```
Type III   Mean
Source         DF     SS     Square  F Value   Pr > F

snoutvent       1   33.949   33.949    88.05   <.0001
sex             1    0.079    0.079     0.21   0.6537
snoutvent*sex   1    0.005    0.005     0.01   0.9101 slope P-value

```

If the P-value of the slopes is significant, you'd be done. In this case it isn't, so you look at the output from the second run of PROC GLM. This time, the MODEL statement doesn't include the interaction term, so the model assumes that the slopes of the regression lines are equal. This P-value tells you whether the Y-intercepts are significantly different:

```
Type III   Mean
Source         DF     SS     Square  F Value   Pr > F

snoutvent       1   41.388   41.388   110.76   <.0001
sex             1    2.016    2.016     5.39   0.0267 intercept P-value

```

### Power analysis

The following form calculates the sample size needed for an ancova, using the method of Borm et al. (2007). It only works for ancova with two groups, and it assumes each group has the same standard deviation and the same r2. To use it, enter:

• the effect size, or the difference in Y-intercepts you hope to detect;
• the standard deviation. This is the standard deviation of all the Y values within each group (without controlling for the X variable). For example, in the alligator data above, this would be the standard deviation of pelvic width among males, or the standard deviation of pelvic width among females.
• alpha, or the significance level (usually 0.05);
• power, the probability of rejecting the null hypothesis when the given effect size is the true difference (0.80 and 0.90 are common values);
• the r2 within groups. For the alligator data, this would be the r2 of pelvic width vs. snout-vent length among males, or the r2 among females.

As an example, let's say you want to do a study with an ancova on pelvic width vs. snout-vent length in male and female crocodiles, and since you don't have any preliminary data on crocodiles, you're going to base your sample size calculation on the alligator data. You want to detect a difference in adjusted means of 0.2 cm. The standard deviation of pelvic width in the male alligators is 1.45 and for females is 1.02; taking the average, enter 1.23 for standard deviation. The r2 in males is 0.774 and for females it's 0.780, so enter the average (0.777) for r2 in the form. With 0.05 for the alpha and 0.80 for the power, the result is that you'll need 133 male crocodiles and 133 female crocodiles.

Difference in Y-intercepts:
Standard deviation within groups:
r2 within groups:
Alpha (significance level of test):
Power (probability of significant result if alternative hypothesis is true):
Two-tailed test     One-tailed test

The form on the web version of this handbook calculates the sample size needed for an ancova, using the method of Borm et al. (2007). It only works for ancova with two groups, and it assumes each group has the same standard deviation and the same r2. To use it, enter:

• the effect size, or the difference in Y-intercepts you hope to detect;
• the standard deviation. This is the standard deviation of all the Y values within each group (without controlling for the X variable). For example, in the alligator data above, this would be the standard deviation of pelvic width among males, or the standard deviation of pelvic width among females.
• alpha, or the significance level (usually 0.05);
• power, the probability of rejecting the null hypothesis when the given effect size is the true difference (0.80 and 0.90 are common values);
• the r2 within groups. For the alligator data, this would be the r2 of pelvic width vs. snout-vent length among males, or the r2 among females.

As an example, let's say you want to do a study with an ancova on pelvic width vs. snout-vent length in male and female crocodiles, and since you don't have any preliminary data on crocodiles, you're going to base your sample size calculation on the alligator data. You want to detect a difference in adjusted means of 0.2 cm. The standard deviation of pelvic width in the male alligators is 1.45 and for females is 1.02; taking the average, enter 1.23 for standard deviation. The r2 in males is 0.774 and for females it's 0.780, so enter the average (0.777) for r2 in the form. With 0.05 for the alpha and 0.80 for the power, the result is that you'll need 133 male crocodiles and 133 female crocodiles.

### Further reading

Sokal and Rohlf, pp. 499-521.

### References

Frog calls from Knapp, W.S. 2007. The Frogs and Toads of Georgia.

Alligator skeleton from University of Washington: Biology 453, Comparative Vertebrate Anatomy.

Borm, G.F., J. Fransen, and W.A.J.G. Lemmens. 2007. A simple sample size formula for analysis of covariance in randomized clinical trials. J. Clin. Epidem. 60: 1234-1238.

Conover, W.J., and R.L. Iman. Analysis of covariance using the rank transformation. Biometrics 38: 715-724.

Prieto-Marquez, A., P.M. Gignac, and S. Joshi. 2007. Neontological evaluation of pelvic skeletal attributes purported to reflect sex in extinct non-avian archosaurs. J. Vert. Paleontol. 27: 603-609.

Rooney, J., and S.M. Lewis. 2002. Fitness advantage from nuptial gifts in female fireflies. Ecol. Entom. 27: 373-377.

Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 14, 2009. Its address is http://udel.edu/~mcdonald/statancova.html. It may be cited as pp. 232-237 in: McDonald, J.H. 2009. Handbook of Biological Statistics. Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.