Exact test for goodness-of-fit
The main goal of a statistical test is to answer the question, "What is the probability of getting a result like my observed data, if the null hypothesis were true?" If it is very unlikely to get the observed data under the null hypothesis, we reject the null hypothesis.
Most statistical tests take the following form:
- Collect the data.
- Calculate a number, the test statistic, that measures how far the observed data deviate from the expectation under the null hypothesis.
- Use a mathematical function to estimate how likely it is to get a test statistic as extreme as the one you observed, if the null hypothesis were true. This is the P-value.
Exact tests, such as the exact test for goodness-of-fit, are different. There is no test statistic; instead, the probability of obtaining the observed data under the null hypothesis is calculated directly. This is because the predictions of the null hypothesis are so simple that the probabilities can easily be calculated.
When to use it
You use the exact binomial test when you have one nominal variable with only two values (such as male vs. female, left vs. right, green vs. yellow). The observed data are compared with the expected data, which are some kind of theoretical expectation (such as a 1:1 sex ratio or a 3:1 ratio in a genetic cross) that is determined before the data are collected. If the total number of observations is too high (around a thousand), computers may not be able to do the calculations for the exact test, and a G-test or chi-square test of goodness-of-fit must be used instead (and will give almost exactly the same result).
You can do exact multinomial tests of goodness-of-fit when the nominal variable has more than two values. The basic concepts are the same as for the exact binomial test. Here I'm limiting the explanation to the binomial test, because it's more commonly used and easier to understand.
Null hypothesis
For a two-tailed test, which is what you almost always should use, the null hypothesis is that the number of observations in each category is equal to that predicted by a biological theory, and the alternative hypothesis is that the observed data are different from the expected. If you are doing a one-tailed test, the null hypothesis is that the observed number for one category is equal to or less than the expected; the alternative hypothesis is that the observed number in that category is greater than expected.
How the test works
Let's say you want to know whether our cat, Gus, has a preference for one paw or uses both paws equally. You dangle a ribbon in his face and record which paw he uses to bat at it. You do this 10 times, and he bats at the ribbon with his right paw 8 times and his left paw 2 times. Then he gets bored with the experiment and leaves. Can you conclude that he is right-pawed, or could this result have occurred due to chance under the null hypothesis that he bats equally with each paw?
The null hypothesis is that 0.5 of the time, Gus will use his right paw. The probability that he will use his right paw on the first time is 0.5. The probability that he will use his right paw the first time AND the second time is 0.5 x 0.5, or 0.52, or 0.25. The probability that he will use his right paw all ten times is 0.510, or about 0.001.
For a mixture of right and left paws, the calculation is more complicated. Where n is the total number of trials, k is the number of "successes" (statistical jargon for whichever event you want to consider), p is the expected proportion of successes if the null hypothesis is true, and Y is the probability of getting k successes in n trials, the equation is:
Y = pk(1-p)(n-k)n!
————————————
k!(n-k)!
Fortunately, there's an spreadsheet function that does the calculation for you. To calculate the probability of getting exactly 8 out of 10 right paws, you would enter
=BINOMDIST(2, 10, 0.5, FALSE)
The first number, 2, is whichever event there are fewer than expected of; in this case, there are only two uses of the left paw, which is fewer than the expected 10. The second number is the total number of trials. The third number is the expected proportion of whichever event there were fewer than expected of. And FALSE tells it to calculate the exact probability for that number of events only. In this case, the answer is P=0.044, so you might think it was significant at the P<0.05 level.
However, it would be incorrect to only calculate the probability of getting exactly 2 left paws and 8 right paws. Instead, you must calculate the probability of getting a deviation from the null expectation as large as, or larger than, the observed result. So you must calculate the probability that Gus used his left paw 2 times out of 10, or 1 time out of 10, or 0 times out of ten. Adding these probabilities together gives P=0.055, which is not quite significant at the P<0.05 level. You do this in a spreadsheet by entering
=BINOMDIST(2, 10, 0.5, TRUE).
The "TRUE" parameter tells the spreadsheet to calculate the sum of the probabilities of the observed number and all more extreme values; it's the equivalent of
=BINOMDIST(2, 10, 0.5, FALSE)+BINOMDIST(1, 10, 0.5, FALSE)+BINOMDIST(0, 10, 0.5, FALSE).
There's one more thing. The above calculation gives the total probability of getting 2, 1, or 0 uses of the left paw out of 10. However, the alternative hypothesis is that the number of uses of the right paw is not equal to the number of uses of the left paw. If there had been 2, 1, or 0 uses of the right paw, that also would have been an equally extreme deviation from the expectation. So you must add the probability of getting 2, 1, or 0 uses of the right paw, to account for both tails of the probability distribution; you are doing a two-tailed test. This gives you P=0.109, which is not very close to being significant. (If the null hypothesis had been 0.50 or more uses of the left paw, and the alternative hypothesis had been less than 0.5 uses of left paw, you could do a one-tailed test and use P=0.054. But you almost never have a situation where a one-tailed test is appropriate.)
![]() |
| Graph showing the probability distribution for the binomial with 10 trials. |
The most common use of an exact binomial test is when the null hypothesis is that numbers of the two outcomes are equal. In that case, the meaning of a two-tailed test is clear, and the two-tailed P-value is found by multiplying the one-tailed P-value times two.
When the null hypothesis is not a 1:1 ratio, but something like a 3:1 ratio, the meaning of a two-tailed exact binomial test is not agreed upon; different statisticians, and different statistical programs, have slightly different interpretations and sometimes give different results for the same data. My spreadsheet adds the probabilities of all possible outcomes that are less likely than the observed numbers; this method of small P-values is preferred by most statisticians.
Examples
Mendel crossed pea plants that were heterozygotes for green pod/yellow pod; pod color is the nominal variable, with "green" and "yellow" as the values. If this is inherited as a simple Mendelian trait, with green dominant over yellow, the expected ratio in the offspring is 3 Green: 1 yellow. He observed 428 Green and 152 yellow. The expected numbers of plants under the null hypothesis are 435 Green and 145 yellow, so Mendel observed slightly fewer Green-pod plants than expected. The P-value for an exact binomial test is 0.533, indicating that the null hypothesis cannot be rejected; there is no significant difference between the observed and expected frequencies of pea plants with green pods.
Roptrocerus xylophagorum is a parasitoid of bark beetles. To determine what cues these wasps use to find the beetles, Sullivan et al. (2000) placed female wasps in the base of a Y-shaped tube, with a different odor in each arm of the Y, then counted the number of wasps that entered each arm of the tube. In one experiment, one arm of the Y had the odor of bark being eaten by adult beetles, while the other arm of the Y had bark being eaten by larval beetles. Ten wasps entered the area with the adult beetles, while 17 entered the area with the larval beetles. The difference from the expected 1:1 ratio is not significant (P=0.248). In another experiment that compared infested bark with a mixture of infested and uninfested bark, 36 wasps moved towards the infested bark, while only 7 moved towards the mixture; this is significantly different from the expected ratio (P=9×10-6).
Graphing the results
You plot the results of an exact test the same way would any other goodness-of-fit test.
Similar tests
A G-test or chi-square goodness-of-fit test could also be used for the same data as the exact test of goodness-of-fit. Where the expected numbers are small, the exact test will give more accurate results than the G-test or chi-squared tests. Where the sample size is large (over a thousand), attempting to use the exact test may give error messages (computers have a hard time calculating factorials for large numbers), so a G-test or chi-square test must be used. For intermediate sample sizes, all three tests give approximately the same results. I recommend that you use the exact test when n is less than 1000; see the web page on small sample sizes for further discussion.
The exact test and randomization test should give you the same result, if you do enough replicates for the randomization test, so the choice between them is a matter of personal preference. The exact test sounds more "exact"; the randomization test may be easier to understand and explain.
The sign test is a particular application of the exact binomial test. It is usually used when observations of a measurement variable are made in pairs (such as right-vs.-left or before-vs.-after), and only the direction of the difference, not the size of the difference, is of biological interest.
The exact test for goodness-of-fit is not the same as Fisher's exact test of independence. A test of independence is used for two nominal variables, such as sex and location. If you wanted to compare the ratio of males to female students at Delaware to the male:female ratio at Maryland, you would use a test of independence; if you want to compare the male:female ratio at Delaware to a theoretical 1:1 ratio, you would use a goodness-of-fit test.
Power analysis
For the exact binomial test, you can do the power analysis with this power analysis for proportions web page. This web page is set up for one-tailed tests, rather than the more common two-tailed tests, so enter alpha = 2.5 instead of alpha = 5 percent. Note that if the null expectation is not a 1:1 ratio, you will get slightly different results, depending on whether you make the observed proportion smaller or larger than the expected; use whichever gives you a larger sample size.
If your nominal variable has more than two values, use this power and sample size page. It is designed for chi-square tests, not exact tests, but the sample sizes will be very close. Choose "Generic chi-square test" from the box on the left side of the page (if you don't see the list of tests, make sure your web browser has Java turned on). Under "Prototype data," enter the chi-square value and sample size for some fake data. For example, if you're doing a genetic cross with an expected 1:2:1 ratio, and your minimum effect size is 10 percent more heterozygotes than expected, use the chi-square spreadsheet to do a chi-square test on observed numbers of 20:60:20 compared to expected proportions of 1:2:1. The spreadsheet gives you a chi-square value of 4.00 and an n of 100, which you enter under "Prototype data". Then set d (the degrees of freedom) equal to 2, and leave alpha at 0.05. The sliders can then be slid back and forth to yield the desired result. For example, if you slide the Power to 0.90, n is equal to 316. Note that the absolute values of the prototype data don't matter, only their relative relationship; you could have used 200:600:200, which would give you a chi-square value of 40.0 and an n of 1000, and gotten the exact same result.
How to do the test
Spreadsheet
I have set up a spreadsheet that performs the exact binomial test for sample sizes up to 1000. It is self-explanatory.
Web page
Richard Lowry has set up a web page that does the exact binomial test. I'm not aware of any web pages that will do exact multinomial tests.
SAS
Here is a sample SAS program, showing how to do the exact binomial test on the Gus data. The p=0.5 gives the expected proportion of whichever value of the nominal variable is alphabetically first; in this case, it gives the expected proportion of "left."
The SAS exact binomial function finds the two-tailed P-value by doubling the P-value of one tail. The binomial distribution is not symmetrical when the expected proportion is other than 50 percent, so the technique SAS uses isn't as good as the method of small P-values. I don't recommend doing the exact binomial test in SAS when the expected proportion is anything other than 50 percent.
data gus; input paw $; cards; right left right right right right left right right right ; proc freq data=gus; tables paw / binomial(p=0.5); exact binomial; run;
Near the end of the output is this:
Exact Test One-sided Pr <= P 0.0547 Two-sided = 2 * One-sided 0.1094
The "Two-sided=2*One-sided" number is the two-tailed P-value that you want.
If you have the total numbers, rather than the raw values, you'd use a "weight" parameter in PROC FREQ:
data gus; input paw $ count; cards; right 10 left 2 ; proc freq data=gus; weight count; tables paw / binomial(p=0.5); exact binomial; run;
This example shows how do to the exact multinomial test. The numbers are imaginary data from a genetic cross in which you expect a 1:2:1 ratio of red:pink:white. The testp=(0.5 0.25 0.25) lists the expected proportions for the values of the nominal variable in alphabetical order, so in this case it's pink first, then red, then white.
data flowers; input color $ count; cards; red 5 pink 2 white 1 ; proc freq data=flowers; weight count; tables color / chisq testp=(0.5 0.25 0.25); exact chisq; run;
The P-value you want is labelled "Exact Pr > ChiSq":
Chi-Square Test
for Specified Proportions
---------------------------------------
Chi-Square 6.0000
DF 2
Asymptotic Pr > ChiSq 0.0498
Exact Pr >= ChiSq 0.0596
Further reading
Sokal and Rohlf, pp. 686-687.
Zar, pp. 533-538.
References
Mendel, G. 1865. Experiments in plant hybridization. available at MendelWeb.
Sullivan, B.T., E.M. Pettersson, K.C. Seltmann, and C.W. Berisford. 2000. Attraction of the bark beetle parasitoid Roptrocerus xylophagorum (Hymenoptera: Pteromalidae) to host-associated olfactory cues. Env. Entom. 29: 1138-1151.
⇐ Previous topic | Next topic ⇒
This page was last revised January 2, 2008. Its address is http://udel.edu/~mcdonald/statexactbin.html.
©2007-2008 by John H. McDonald. You can probably do what you want with this content; see the permissions page at http://udel.edu/~mcdonald/statpermissions.html for details.


