Chi-square test of independence
The chi-square test may be used both as a test of goodness-of-fit (comparing frequencies of one nominal variable to theoretical expecations) and as a test of independence (comparing frequencies of one nominal variable for different values of a second nominal variable). The underlying arithmetic of the test is the same; the only difference is the way the expected values are calculated. However, goodness-of-fit tests and tests of independence are used for quite different experimental designs and test different null hypotheses, so I treat the chi-square test of goodness-of-fit and the chi-square test of independence as two distinct statistical tests.
The chi-square test of independence is an alternative to the G-test of independence. Most of the information on this page is identical to that on the G-test page. You should read the section on "Chi-square vs. G-test", pick either chi-square or G-test, then stick with that choice for the rest of your life.
When to use it
The chi-squared test of independence is used when you have two nominal variables, each with two or more possible values. A data set like this is often called an "R×C table," where R is the number of rows and C is the number of columns. For example, if you surveyed the frequencies of three flower phenotypes (red, pink, white) in four geographic locations, you would have a 3×4 table. You could also consider it a 4×3 table; it doesn't matter which variable is the columns and which is the rows.
It is also possible to do a chi-squared test of independence with more than two nominal variables, but that experimental design doesn't occur very often and is rather complicated to analyze and interpret, so I won't cover it.
Null hypothesis
The null hypothesis is that the relative proportions of one variable are independent of the second variable; in other words, the proportions at one variable are the same for different values of the second variable. In the flower example, the null hypothesis is that the proportions of red, pink and white flowers are the same at the four geographic locations.
For some experiments, you can express the null hypothesis in two different ways, and either would make sense. For example, when an individual clasps their hands, there is one comfortable position; either the right thumb is on top, or the left thumb is on top. Downey (1926) collected data on the frequency of right-thumb vs. left-thumb clasping in right-handed and left-handed individuals. You could say that the null hypothesis is that the proportion of right-thumb-clasping is the same for right-handed and left-handed individuals, or you could say that the proportion of right-handedness is the same for right-thumb-clasping and left-thumb-clasping individuals.
For other experiments, it only makes sense to express the null hypothesis one way. In the flower example, it would make sense to say that the null hypothesis is that the proportions of red, pink and white flowers are the same at the four geographic locations; it wouldn't make sense to say that the proportion of flowers at each location is the same for red, pink, and white flowers.
How the test works
The math of the chi-square test of independence is the same as for the chi-square test of goodness-of-fit, only the method of calculating the expected frequencies is different. For the goodness-of-fit test, a theoretical relationship is used to calculate the expected frequencies. For the test of independence, only the observed frequencies are used to calculate the expected. For the hand-clasping example, Downey (1926) found 190 right-thumb and 149 left-thumb-claspers among right-handed women, and 42 right-thumb and 49 left-thumb-claspers among left-handed women. To calculate the estimated frequency of right-thumb-claspers among right-handed women, you would first calculate the overall proportion of right-thumb-claspers: (190+42)/(190+42+149+49)=0.5395. Then you would multiply this overall proportion times the total number of right-handed women, 0.5395×(190+149)=182.9. This is the expected number of right-handed right-thumb-claspers under the null hypothesis; the observed number is 190. Similar calculations would be done for each of the cells in this 2×2 table of numbers.
The degrees of freedom in a test of independence are equal to (number of rows)−1 × (number of columns)−1. Thus for a 2×2 table, there are (2−1)×(2−1)=1 degree of freedom; for a 4×3 table, there are (4−1)×(3−1)=6 degrees of freedom.
Examples
Gardemann et al. (1998) surveyed genotypes at an insertion/deletion polymorphism of the apolipoprotein B signal peptide in 2259 men. Of men without coronary artery disease, 268 had the ins/ins genotype, 199 had the ins/del genotype, and 42 had the del/del genotype. Of men with coronary artery disease, there were 807 ins/ins, 759 ins/del, and 184 del/del.
The two nominal variables are genotype (ins/ins, ins/del, or del/del) and disease (with or without). The biological null hypothesis is that the apolipoprotein polymorphism doesn't affect the likelihood of getting coronary artery disease. The statistical null hypothesis is that the proportions of men with coronary artery disease are the same for each of the three genotypes.
The result is chi2=7.26, 2 d.f., P=0.027. This indicates that the null hypothesis can be rejected; the three genotypes have significantly different proportions of men with coronary artery disease.
Young and Winn (2003) counted sightings of the spotted moray eel, Gymnothorax moringa, and the purplemouth moray eel, G. vicinus, in a 150-m by 250-m area of reef in Belize. They identified each eel they saw, and classified the locations of the sightings into three types: those in grass beds, those in sand and rubble, and those within one meter of the border between grass and sand/rubble. The number of sightings are shown in the table, with percentages in parentheses:
G. moringa G. vicinus
Grass 127 (25.9) 116 (33.7)
Sand 99 (20.2) 67 (19.5)
Border 264 (53.9) 161 (46.8)
The nominal variables are the species of eel (G. moringa or G. vicinus and the habitat type (grass, sand, or border). The difference in habitat use between the species is significant (chi-square=6.26, 2 d.f., P=0.044).
Graphing the results
The data used in a test of independence are usually displayed with a bar graph, with the values of one variable on the X-axis and the proportions of the other variable on the Y-axis. If the variable on the Y-axis only has two values, you only need to plot one of them:
![]() |
| A bar graph for when the nominal variable has only two values. |
If the variable on the Y-axis has more than two values, you should plot all of them. Sometimes pie charts are used for this:
![]() |
| A pie chart for when the nominal variable has more than two values. |
But as much as I like pie, I think pie charts make it difficult to see small differences in the proportions. In this situation, I prefer bar graphs:
![]() |
| A bar graph for when the nominal variable has more than two values. |
Similar tests
There are several tests that use chi-square statistics. The one described here is formally known as Pearson's chi-square. It is by far the most common chi-square test, so it is usually just called the chi-square test.
If the expected numbers in some classes are small, the chi-squared test will give inaccurate results. In that case, you should use Fisher's exact test if there are only two variables with two classes in each. See the web page on small sample sizes for further discussion.
Chi-square vs. G-test
The chi-square test gives approximately the same results as the G-test. Unlike the chi-square test, G-values are additive, which means they can be used for more elaborate statistical designs. G-tests are a subclass of likelihood ratio tests, a general category of tests that have many uses for testing the fit of data to mathematical models; the more elaborate versions of likelihood ratio tests don't have equivalent tests using the Pearson chi-square statistic. The G-test is therefore preferred by many, even for simpler designs. On the other hand, the chi-square test is more familiar to more people, and it's always a good idea to use statistics that your readers are familiar with when possible. You may want to look at the literature in your field and see which is more commonly used.
Power analysis
For a 2×2 table with equal sample sizes, you can use this power analysis web page to do your power analysis. This web page is set up for one-tailed tests, rather than the more common two-tailed tests, so enter alpha = 2.5 instead of alpha = 5 percent. If you enter 50% for sample 1 percentage, 55% for sample two percentage, 2.5% for alpha and 10% for beta, the results will say "Sample size = 2094 for both samples!" This means that each sample size would have to be 2094, or a total of 4188 in the two samples put together.
For a 2×2 table with unequal sample sizes, you can use this power analysis web page. This page is set up for two-tailed tests, so enter alpha = 0.05. Enter power (which is 1−beta) instead of beta; if you want the probability of a false negative (beta) to be 10 percent, enter 90 in the "power" box.
I don't know how to do a power analysis if one or both variables has more than two values.
How to do the test
Spreadsheet
I have set up a spreadsheet that performs this test for up to 10 columns and 50 rows. It is largely self-explanatory; you just enter you observed numbers, and the spreadsheet calculates the chi-squared test statistic, the degrees of freedom, and the P-value.
Web page
There are many web pages that do chi-squared tests of independence, but most are limited to fairly small numbers of rows and columns. One page that will handle large data sets is here.
SAS
Here is a SAS program that uses PROC FREQ for a chi-square test. It uses the handclasping data from above.
data handclasp; input thumb $ hand $ count; cards; rightthumb righthand 190 leftthumb righthand 149 rightthumb lefthand 42 leftthumb lefthand 49 ; proc freq data=handclasp; weight count / zeros; tables thumb*hand / chisq; run;
The output includes the following:
Statistics for Table of thumb by hand
Statistic DF Value Prob
-------------------------------------------------------------
Chi-Square 1 2.8265 0.0927
Likelihood Ratio Chi-Square 1 2.8187 0.0932
Continuity Adj. Chi-Square 1 2.4423 0.1181
Cochran–Mantel–Haenszel Chi-Square 1 2.8199 0.0931
Phi Coefficient 0.0811
Contingency Coefficient 0.0808
Cramer's V 0.0811
The "Chi-Square" on the first line is the P-value for the chi-square test; in this case, chi-square=2.8265, 1 d.f., P=0.0927.
Further reading
Sokal and Rohlf, pp. 736-737.
Zar, pp. 486-500.
References
Downey, J.E. 1926. Further observations on the manner of clasping the hands. American Naturalist 60: 387-391.
Gardemann, A., D. Ohly, M. Fink, N. Katz, H. Tillmanns, F.W. Hehrlein, and W. Haberbosch. 1998. Association of the insertion/deletion gene polymorphism of the apolipoprotein B signal peptide with myocardial infarction. Atheroslerosis 141: 167-175.
Young, R.F., and H.E. Winn. 2003. Activity patterns, diet, and shelter site use for two species of moray eels, Gymnothorax moringa and Gymnothorax vicinus, in Belize. Copeia 2003: 44-55.
⇐ Previous topic | Next topic ⇒
This page was last revised August 5, 2008. Its address is http://udel.edu/~mcdonald/statchiind.html. It may be cited as pp. 52-57 in: McDonald, J.H. 2008. Handbook of Biological Statistics. Sparky House Publishing, Baltimore, Maryland.
©2008 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.



