# Cochran–Mantel–Haenszel test for repeated tests of independence

### When to use it

You use the Cochran–Mantel–Haenszel test (which is sometimes called the Mantel–Haenszel test) for repeated tests of independence. There are three nominal variables; you want to know whether two of the variables are independent of each other, and the third variable identifies the repeats. The most common situation is that you have multiple 2×2 tables of independence, so that's what I'll talk about here. There are versions of the Cochran–Mantel–Haenszel test for any number of rows and columns in the individual tests of independence, but I won't cover them.

For example, let's say you've found several hundred pink knit polyester legwarmers that have been hidden in a warehouse since they went out of style in 1984. You decide to see whether they reduce the pain of ankle osteoarthritis by keeping the ankles warm. In the winter, you recruit 36 volunteers with ankle arthritis, randomly assign 20 to wear the legwarmers under their clothes at all times while the other 16 don't wear the legwarmers, then after a month you ask them whether their ankles are pain-free or not. With just the one set of people, you'd have two nominal variables (legwarmers vs. control, pain-free vs. pain), each with two values, so you'd analyze the data with Fisher's exact test.

However, let's say you repeat the experiment in the spring, with 50 new volunteers. Then in the summer you repeat the experiment again, with 28 new volunteers. You could just add all the data together and do Fisher's exact test on the 114 total people, but it would be better to keep each of the three experiments separate. Maybe the first time you did the experiment there was an overall higher level of ankle pain than the second time, because of the different time of year or the different set of volunteers. You want to see whether there's an overall effect of legwarmers on ankle pain, but you want to control for possibility of different levels of ankle pain at the different times of year.

### Null hypothesis

The null hypothesis is that the two nominal variables that are tested within each repetition are independent of each other; having one value of one variable does not mean that it's more likely that you'll have one value of the second variable. For your imaginary legwarmers experiment, the null hypothesis would be that the proportion of people feeling pain was the same for legwarmer-wearers and non-legwarmer wearers, after controlling for the time of year. The alternative hypothesis is that the proportion of people feeling pain was different for legwarmer and non-legwarmer wearers.

Technically, the null hypothesis of the Cochran–Mantel–Haenszel test is that the odds ratios within each repetition are equal to 1. The odds ratio is equal to 1 when the proportions are the same, and the odds ratio is different from 1 when the proportions are different from each other. I think proportions are easier to grasp than odds ratios, so I'll put everything in terms of proportions.

### How it works

If the four numbers in a 2×2 test of independence are labelled like this:

a b c d

and (a+b+c+d)=n, the equation for the Cochran–Mantel–Haenszel test statistic can be written like this:

χ^{2}_{MH}= {|∑[a−(a+b)(a+c)/n]|−0.5}^{2}———————————————————————————— ∑(a+b)(a+c)(b+d)(c+d)/(n^{3}−n^{2})

The numerator contains the absolute value of the difference between the observed value in one cell (*a*) and the expected value under the null hypothesis, (a+b)(a+c)/n, so the numerator is the squared sum of deviations between the observed and expected values. It doesn't matter how you arrange the 2×2 tables, any of the four values can be used as *a*. The 0.5 is subtracted as a continuity correction. The denominator contains an estimate of the variance of the squared differences.

The test statistic, χ^{2}_{MH}, gets bigger as the differences between the observed and expected values get larger, or as the variance gets smaller (primarily due to the sample size getting bigger). It is chi-square distributed with one degree of freedom.

Different sources present the formula for the Cochran–Mantel–Haenszel test in different forms, but they are all algebraically equivalent. The formula I've shown here includes the continuity correction (subtracting 0.5 in the numerator); sometimes the Cochran–Mantel–Haenszel test is done without the continuity correction, so you should be sure to specify whether you used it when reporting your results.

Some statisticians recommend that you test the homogeneity of the odds ratios in the different repeats, and if different repeats show significantly different odds ratios, you shouldn't do the Cochran–Mantel–Haenszel test. In our arthritis-legwarmers example, they would say that if legwarmers have a significantly different effect on pain in the different seasons, you should analyze each experiment separately, rather than all together as the Cochran–Mantel–Haenszel test does. The most common way to test the homogeneity of odds ratios is with the Breslow–Day test, which I won't cover here.

Other statisticians will tell you that it's perfectly okay to use the Cochran–Mantel–Haenszel test when the odds ratios are significantly heterogeneous. The different recommendations depend on what your goal is. If your main goal is hypothesis testing—you want to know whether legwarmers reduce pain, in our example—then the Cochran–Mantel–Haenszel test is perfectly appropriate. A significant result will tell you that yes, the proportion of people feeling ankle pain does depend on whether or not they're wearing legwarmers. If your main goal is estimation—you want to estimate how well legwarmers work and come up with a number like "people with ankle arthritis are 50% less likely to feel pain if they wear fluorescent pink polyester knit legwarmers"—then it would be inappropriate to combine the data using the Cochran–Mantel–Haenszel test. If legwarmers reduce pain by 70% in the winter, 50% in the spring, and 30% in the summer, it would be misleading to say that they reduce pain by 50%; instead, it would be better to say that they reduce pain, but the amount of pain reduction depends on the time of year.

### Examples

McDonald and Siebenaller (1989) surveyed allele frequencies at the
*Lap* locus in the
mussel *Mytilus trossulus* on the Oregon coast. At four estuaries,
samples were taken
from inside the estuary and from a marine habitat outside the estuary.
There were three common alleles and a couple of rare alleles; based on previous results, the biologically interesting question was whether the *Lap ^{94}* allele was less common inside estuaries, so all the other alleles were pooled into a "non-

*94*" class.

There are three nominal variables: allele (94 or non-94), habitat (marine or estuarine), and area (Tillamook, Yaquina, Alsea, or Umpqua). The null hypothesis is that at each area, there is no difference in the proportion of Lap^{94} alleles between the marine and estuarine habitats, after controlling for area.

This table shows the number of *94* and non-*94* alleles at each location. There is a smaller proportion of *94* alleles in the estuarine location of each estuary when compared with the marine location; we wanted to know whether this difference is significant.

Location | Allele | Marine | Estuarine |
---|---|---|---|

Tillamook | 94 | 56 | 69 |

non-94 | 40 | 77 | |

Yaquina | 94 | 61 | 257 |

non-94 | 57 | 301 | |

Alsea | 94 | 73 | 65 |

non-94 | 71 | 79 | |

Umpqua | 94 | 71 | 48 |

non-94 | 55 | 48 |

Applying the formula given above, the numerator is 355.84, the denominator is 70.47, so the result is χ^{2}_{MH}=5.05, 1 d.f., P=0.025. You can reject the null hypothesis that the proportion of *Lap*^{94} alleles is the same in the marine and estuarine locations.

Gagnon et al. (2007) studied elk use of wildlife underpasses on a highway in Arizona. Using video surveillance cameras, they recorded each elk that started to cross under the highway. When a car or truck passed over while the elk was in the underpass, they recorded whether the elk continued through the underpass ("crossing") or turned around and left ("retreat"). The overall traffic volume was divided into low (fewer than 4 vehicles per minute) and high. There are three nominal variables: vehicle type (truck or car), traffic volume (low or high), and elk behavior (crossing or retreat). The question is whether trucks or cars are more likely to scare elk out of underpasses.

Crossing | Retreat | ||
---|---|---|---|

Low traffic | Car | 287 | 57 |

Truck | 40 | 42 | |

High traffic | Car | 237 | 52 |

Truck | 57 | 12 |

The result of the test is χ^{2}_{MH}=24.39, 1 d.f., P=7.9×10^{-7}. More elk are scared out of the underpasses by trucks than by cars.

### Graphing the results

To graph the results of a Cochran–Mantel–Haenszel test, pick one of the two values of the nominal variable that you're observing and plot its proportions on a bar graph, using bars of two different patterns.

Lap^{94} allele proportions in the mussel Mytilus trosulus at four bays in Oregon. Gray bars are marine samples and empty bars are estuarine samples. Error bars are 95% confidence intervals. |

### Similar tests

Sometimes the Cochran–Mantel–Haenszel test is just called the Mantel–Haenszel test. This is confusing, as there is also a test for homogeneity of odds ratios called the Mantel–Haenszel test, and a Mantel–Haenszel test of independence for one 2×2 table. Mantel and Haenszel (1959) came up with a fairly minor modification of the basic idea of Cochran (1954), so it seems appropriate (and somewhat less confusing) to give Cochran credit in the name of this test.

If you have at least six 2×2 tables, and you're only interested in the *direction* of the differences in proportions, not the size of the differences, you could do a sign test. See the sign test web page for an example of an experiment with a very similar design to the *Lap* in *Mytilus trossulus* experiment described above, where because of the different biology of the organism, a sign test was more appropriate.

The Cochran–Mantel–Haenszel test for nominal variables is analogous to a two-way anova or paired t-test for a measurement variable, or a Wilcoxon signed-rank test for rank data. In the arthritis-legwarmers example, if you measured ankle pain on a 10-point scale (a measurement variable) instead of categorizing it as pain/no pain, you'd analyze the data with a two-way anova.

### How to do the test

#### Spreadsheet

I've written a spreadsheet to perform the Cochran–Mantel–Haenszel test. It handles up to 50 2×2 tables (and you should be able to modify it to handle more, if necessary).

#### Web pages

I'm not aware of any web pages that will perform the Cochran–Mantel–Haenszel test.

#### SAS

Here is a SAS program that uses PROC FREQ for a Cochran–Mantel–Haenszel test. It uses the mussel data from above. In the TABLES statement, the variable that labels the repeats is listed first; in this case it is LOCATION.

data lap; input location $ habitat $ allele $ count; cards; Tillamook marine 94 56 Tillamook estuarine 94 69 Tillamook marine non-94 40 Tillamook estuarine non-94 77 Yaquina marine 94 61 Yaquina estuarine 94 257 Yaquina marine non-94 57 Yaquina estuarine non-94 301 Alsea marine 94 73 Alsea estuarine 94 65 Alsea marine non-94 71 Alsea estuarine non-94 79 Umpqua marine 94 71 Umpqua estuarine 94 48 Umpqua marine non-94 55 Umpqua estuarine non-94 48 ; proc freq data=lap; weight count / zeros; tables location*habitat*allele / cmh; run;

There is a lot of output, but the important part looks like this:

Cochran–Mantel–Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------- 1 Nonzero Correlation 1 5.3209 0.0211 2 Row Mean Scores Differ 1 5.3209 0.0211 3 General Association 1 5.3209 0.0211

For repeated 2x2 tables, the three statistics are identical; they are the Cochran–Mantel–Haenszel chi-square statistic, *without* the continuity correction. For repeated tables with more than two rows or columns, the "general association" statistic is used when the values of the different nominal variables do not have an order (you cannot arrange them from smallest to largest); you should use it unless you have a good reason to use one of the other statistics.

### Further reading

Sokal and Rohlf, pp. 764-766.

### References

Cochran, W.G. 1954. Some methods for strengthening the common χ^{2} tests. Biometrics 10: 417-451.

Gagnon, J.W., T.C. Theimer, N.L. Dodd, A.L. Manzon, and R.E. Schweinsburg. 2007. Effects of traffic on elk use of wildlife underpasses in Arizona. J. Wildl. Manage. 71: 2324-2328.

Mantel, N., and W. Haenszel. 1959. Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22: 719-748.

McDonald, J.H. and J.F. Siebenaller. 1989. Similar geographic variation at the *Lap* locus in the mussels *Mytilus trossulus* and *M. edulis.* Evolution 43: 228-231.

### ⇐ Previous topic | Next topic ⇒

This page was last revised September 12, 2009. Its address is http://udel.edu/~mcdonald/statcmh.html. It may be cited as pp. 88-94 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.