Fisher's exact test of independence

When to use it

Fisher's exact test is used when you have two nominal variables. A data set like this is often called an "R×C table," where R is the number of rows and C is the number of columns. Fisher's exact test is more accurate than the chi-squared test or G-test of independence when the expected numbers are small. See the web page on small sample sizes for further discussion.

The most common use of Fisher's exact test is for 2×2 tables, so that's mostly what I'll describe here. You can do Fisher's exact test for greater than two rows and columns.

Null hypothesis

The null hypothesis is that the relative proportions of one variable are independent of the second variable. For example, if you counted the number of male and female mice in two barns, the null hypothesis would be that the proportion of male mice is the same in the two barns.

How it works

The hypogeometric distribution is used to calculate the probability of getting the observed data, and all data sets with more extreme deviations, under the null hypothesis that the proportions are the same. For example, if one barn has 3 male and 7 female mice, and the other barn has 15 male and 5 female mice, the probability of getting 3 males in the first barn and 15 males in the second, or 2 and 16, or 1 and 17, or 0 and 18, is calculated. For the usual two-tailed test, the probability of getting deviations as extreme as the observed, but in the opposite direction, is also calculated. This is an exact calculation of the probability; unlike most statistical tests, there is no intermediate step of calculating a test statistic whose probability is approximately known.

When there are more than two rows or columns, you have to decide how you measure deviations from the null expectation, so you can tell what data sets would be more extreme than the observed. The usual method is to calculate the chi-square statistic (formally, it's the Pearson chi-square statistic) for each possible set of numbers, and those with chi-square values equal to or greater than the observed data are considered as extreme as the observed data.

(Note—Fisher's exact test assumes that the row and column totals are fixed. An example would be putting 12 female hermit crabs and 9 male hermit crabs in an aquarium with 7 red snail shells and 14 blue snail shells, then counting how many crabs of each sex chose each color. The total number of female crabs is fixed at 12, and the total number of male crabs, red shells, and blue shells are also fixed. There are few biological experiments where this assumption is true. In the much more common design, the row totals and/or column totals are free to vary. For example, if you took a sample of mice from two barns and counted the number of males and females, you wouldn't know the total number of male mice before doing the experiment; it would be free to vary. In this case, the Fisher's exact test is not, strictly speaking, exact. It is still considered to be more accurate than the chi-square or G-test, and you should feel comfortable using it for any test of independence with small numbers.)

Examples

McDonald and Kreitman (1991) sequenced the alcohol dehydrogenase gene in several individuals of three species of Drosophila. Varying sites were classified as synonymous (the nucleotide variation does not change an amino acid) or amino acid replacements, and they were also classified as polymorphic (varying within a species) or fixed differences between species. The two nominal variables are thus synonymicity ("synonymous" or "replacement") and fixity ("polymorphic" or "fixed"). In the absence of natural selection, the ratio of synonymous to replacement sites should be the same for polymorphisms and fixed differences. There were 43 synonymous polymorphisms, 2 replacement polymorphisms, 17 synonymous fixed differences, and 7 replacement fixed differences.

synonymousreplacement
polymorphisms432
fixed177

The result is P=0.0067, indicating that the null hypothesis can be rejected; there is a significant difference in synonymous/replacement ratio between polymorphisms and fixed differences.

 Eastern chipmunk, Tamias striatus.

The eastern chipmunk trills when pursued by a predator, possibly to warn other chipmunks. Burke da Silva et al. (2002) released chipmunks either 10 or 100 meters from their home burrow, then chased them (to simulate predator pursuit). Out of 24 female chipmunks released 10 m from their burrow, 16 trilled and 8 did not trill. When released 100 m from their burrow, only 3 female chipmunks trilled, while 18 did not trill. Applying Fisher's exact test, the proportion of chipmunks trilling is signficantly higher (P=0.0007) when they are closer to their burrow.

Descamps et al. (2009) tagged 50 king penguins (Aptenodytes patagonicus) in each of three nesting areas (lower, middle, and upper) on Possession Island in the Crozet Archipelago, then counted the number that were still alive a year later. Seven penguins had died in the lower area, six had died in the middle area, and only one had died in the uppper area. Descamps et al. analyzed the data with a G-test of independence, yielding a significant (P=0.048) difference in survival among the areas; however, analyzing the data with Fisher's exact test yields a non-significant (P=0.090) result.

Custer and Galli (2002) flew a light plane to follow great blue herons (Ardea herodias) and great egrets (Casmerodius albus) from their resting site to their first feeding site at Peltier Lake, Minnesota, and recorded the type of substrate each bird landed on.

HeronEgret
Vegetation158
Shoreline205
Water147
Structures61

Fisher's exact test yields P=0.54, so there is no evidence that the two species of birds use the substrates in different proportions.

Graphing the results

You plot the results of Fisher's exact test the same way would any other test of independence.

Similar tests

The chi-squared test of independence or the G-test of independence may be used on the same kind of data as Fisher's exact test. When some of the expected values are small, Fisher's exact test is more accurate than the chi-squared or G-test of independence. If all of the expected values are very large, Fisher's exact test becomes computationally impractical; fortunately, the chi-squared or G-test will then give an accurate result. See the web page on small sample sizes for further discussion.

If the number of rows, number of columns, or total sample size become too large, the program you're using may not be able to perform the calculations for Fisher's exact test in a reasonable length of time, or it may fail entirely. If Fisher's doesn't work, you can use the randomization test of independence.

McNemar's test is used when the two samples are not independent, but instead are two sets of observations on the same individuals. For example, let's say you have 92 children who don't like broccoli and 77 children who like broccoli. You give then your new BroccoYumTM pills for a week, then observe that 14 of the children switched from not liking broccoli before taking the pills to liking broccoli after taking the pills. Three of the children switched in the opposite direction (from liking broccoli to not liking broccoli), and the remaining children stayed the same. The statistical null hypothesis is that the number of switchers in one direction is equal to the number of switchers in the opposite direction. McNemar's test compares the observed data to the null expectation using a goodness-of-fit test. The numbers are almost always small enough that you can make this comparison using the exact binomial test. For the example data of 14 switchers in one direction and 3 in the other direction, P=0.013.

How to do the test

I've written a spreadsheet to perform Fisher's exact test for 2×2 tables. It handles samples with the smaller column total less than 500. [An earlier version of this spreadsheet gave slightly inaccurate P-values for small sample sizes. I fixed it on July 4, 2009. Thanks to Patrick Spagon for pointing out the error.]

Web pages

Several people have created web pages that perform Fisher's exact test for 2×2 tables. I like Øyvind Langsrud's web page for Fisher's exact test. Just enter the numbers into the cells on the web page, hit the Compute button, and get your answer. You should almost always use the "2-tail p-value" given by the web page.

There is also a web page for Fisher's exact test for up to 6×6 tables. It will only take data with fewer than 100 observations in each cell.

SAS

Here is a SAS program that uses PROC FREQ for a Fisher's exact test. It uses the chipmunk data from above.

```
data chipmunk;
input distance \$ sound \$ count;
cards;
10m  trill   16
10m  notrill  8
100m trill    3
100m notrill 18
;
proc freq data=chipmunk;
weight count / zeros;
tables distance*sound / chisq;
run;

```

The output includes the following:

```
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)        18
Left-sided Pr <= F          1.0000
Right-sided Pr >= F      4.321E-04

Table Probability (P)    4.012E-04
Two-sided Pr <= P        6.862E-04

```

The "Two-sided Pr <= P" is the two-tailed P-value that you want.

SAS automatically does Fisher's exact test for 2×2 tables. For greater numbers of rows or columns, you add a line saying exact chisq;. Here is an example using the data on heron and egret substrate use from above:

```
data birds;
input bird \$ substrate \$ count;
cards;
heron vegetation 15
heron shoreline  20
heron water      14
heron structures  6
egret vegetation  8
egret shoreline   5
egret water       7
egret structures  1
;
proc freq data=birds;
weight count / zeros;
tables bird*substrate / chisq;
exact chisq;
run;

```

The results of the exact test are labelled "Exact Pr >= ChiSq"; in this case, P=0.5357.

```
Pearson Chi-Square Test
----------------------------------
Chi-Square                  2.2812
DF                               3
Asymptotic Pr >  ChiSq      0.5161
Exact      Pr >= ChiSq      0.5357

```

Power analysis

The G*Power program will calculate the sample size needed for Fisher's exact test. Choose "Exact" from the "Test family" menu and "Proportions: Inequality, two independent groups (Fisher's exact test)" from the "Statistical test" menu. Enter the proportions you hope to see, your alpha (usually 0.05) and your power (usually 0.80 or 0.90). If you plan to have more observations in one group than in the other, you can make the "Allocation ratio" different from 1.

As an example, let's say you're looking for a relationship between bladder cancer and genotypes at a polymorphism in the catechol-O-methyltransferase gene in humans. Based on previous research, you're going to pool together the GG and GA genotypes and compare these "GG+GA" and AA genotypes. In the population you're studying, you know that the genotype frequencies in people without bladder cancer are 0.84 GG+GA and 0.16 AA; you want to know how many people with bladder cancer you'll have to genotype to get a significant result if they have 6 percent more AA genotypes. It's easier to find controls than people with bladder cancer, so you're planning to have twice as many people without bladder cancer. On the G*Power page, enter 0.16 for proportion p1, 0.22 for proportion p2, 0.05 for alpha, 0.80 for power, and 0.5 for allocation ratio. The result is a total sample size of 1523, so you'll need 508 people with bladder cancer and 1016 people without bladder cancer.

Note that the sample size will be different if your effect size is a 6 percent lower frequency of AA in bladder cancer patients, instead of 6 percent higher. If you don't have a strong idea about which direction of difference you're going to see, you should do the power analysis both ways and use the larger sample size estimate.

Sokal and Rohlf, pp. 734-736.

Zar, pp. 543-555.

References

Picture of chipmunk from Catesby, M. 1731. Natural History of Carolina, Florida and the Bahama Islands, via Wikimedia Commons.

Burke da Silva, K., C. Mahan, and J. da Silva. 2002. The trill of the chase: eastern chipmunks call to warn kin. J. Mammol. 83: 546-552.

Custer, C.M., and J. Galli. 2002. Feeding habitat selection by great blue herons and great egrets nesting in east central Minnesota. Waterbirds 25: 115-124.

Descamps, S., C. le Bohec, Y. le Maho, J.-P. Gendner, and M. Gauthier-Clerc. 2009. Relating demographic performance to breeding-site location in the king penguin. Condor 111: 81-87.

McDonald, J.H. and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652-654.