The chi-square test may be used both as a test of goodness-of-fit (comparing frequencies of one nominal variable to theoretical expecations) and as a test of independence (comparing frequencies of one nominal variable for different values of a second nominal variable). The underlying arithmetic of the test is the same; the only difference is the way the expected values are calculated. However, goodness-of-fit tests and tests of independence are used for quite different experimental designs and test different null hypotheses, so I treat the chi-square test of goodness-of-fit and the chi-square test of independence as two distinct statistical tests.
The chi-square test of independence is an alternative to the G-test of independence. Most of the information on this page is identical to that on the G-test page. You should read the section on "Chi-square vs. G-test", pick either chi-square or G-test, then stick with that choice for the rest of your life.
When to use it
The chi-squared test of independence is used when you have two nominal variables, each with two or more possible values. A data set like this is often called an "R×C table," where R is the number of rows and C is the number of columns. For example, if you surveyed the frequencies of three flower phenotypes (red, pink, white) in four geographic locations, you would have a 3×4 table. You could also consider it a 4×3 table; it doesn't matter which variable is the columns and which is the rows.
It is also possible to do a chi-squared test of independence with more than two nominal variables, but that experimental design doesn't occur very often and is rather complicated to analyze and interpret, so I won't cover it (except for the special case of repeated 2x2 tables, analyzed with the Cochran-Mantel-Haenszel test).
The null hypothesis is that the relative proportions of one variable are independent of the second variable; in other words, the proportions at one variable are the same for different values of the second variable. In the flower example, the null hypothesis is that the proportions of red, pink and white flowers are the same at the four geographic locations.
For some experiments, you can express the null hypothesis in two different ways, and either would make sense. For example, when an individual clasps their hands, there is one comfortable position; either the right thumb is on top, or the left thumb is on top. Downey (1926) collected data on the frequency of right-thumb vs. left-thumb clasping in right-handed and left-handed individuals. You could say that the null hypothesis is that the proportion of right-thumb-clasping is the same for right-handed and left-handed individuals, or you could say that the proportion of right-handedness is the same for right-thumb-clasping and left-thumb-clasping individuals.
For other experiments, it only makes sense to express the null hypothesis one way. In the flower example, it would make sense to say that the null hypothesis is that the proportions of red, pink and white flowers are the same at the four geographic locations; it wouldn't make sense to say that the proportions of locations are the same for red, pink, and white flowers.
How the test works
The math of the chi-square test of independence is the same as for the chi-square test of goodness-of-fit, only the method of calculating the expected frequencies is different. For the goodness-of-fit test, a theoretical relationship is used to calculate the expected frequencies. For the test of independence, only the observed frequencies are used to calculate the expected. For the hand-clasping example, Downey (1926) found 190 right-thumb and 149 left-thumb-claspers among right-handed women, and 42 right-thumb and 49 left-thumb-claspers among left-handed women. To calculate the estimated frequency of right-thumb-claspers among right-handed women, you would first calculate the overall proportion of right-thumb-claspers: (190+42)/(190+42+149+49)=0.5395. Then you would multiply this overall proportion times the total number of right-handed women, 0.5395×(190+149)=182.9. This is the expected number of right-handed right-thumb-claspers under the null hypothesis; the observed number is 190. Similar calculations would be done for each of the cells in this 2×2 table of numbers.
The degrees of freedom in a test of independence are equal to (number of rows)−1 × (number of columns)−1. Thus for a 2×2 table, there are (2−1)×(2−1)=1 degree of freedom; for a 4×3 table, there are (4−1)×(3−1)=6 degrees of freedom.
Gardemann et al. (1998) surveyed genotypes at an insertion/deletion polymorphism of the apolipoprotein B signal peptide in 2259 men. Of men without coronary artery disease, 268 had the ins/ins genotype, 199 had the ins/del genotype, and 42 had the del/del genotype. Of men with coronary artery disease, there were 807 ins/ins, 759 ins/del, and 184 del/del.
The two nominal variables are genotype (ins/ins, ins/del, or del/del) and disease (with or without). The biological null hypothesis is that the apolipoprotein polymorphism doesn't affect the likelihood of getting coronary artery disease. The statistical null hypothesis is that the proportions of men with coronary artery disease are the same for each of the three genotypes.
The result is chi2=7.26, 2 d.f., P=0.027. This indicates that the null hypothesis can be rejected; the three genotypes have significantly different proportions of men with coronary artery disease.
Young and Winn (2003) counted sightings of the spotted moray eel, Gymnothorax moringa, and the purplemouth moray eel, G. vicinus, in a 150-m by 250-m area of reef in Belize. They identified each eel they saw, and classified the locations of the sightings into three types: those in grass beds, those in sand and rubble, and those within one meter of the border between grass and sand/rubble. The number of sightings are shown in the table, with percentages in parentheses:
G. moringa G. vicinus Grass 127 (25.9) 116 (33.7) Sand 99 (20.2) 67 (19.5) Border 264 (53.9) 161 (46.8)
The nominal variables are the species of eel (G. moringa or G. vicinus) and the habitat type (grass, sand, or border). The difference in habitat use between the species is significant (chi-square=6.26, 2 d.f., P=0.044).
The data used in a test of independence are usually displayed with a bar graph, with the values of one variable on the X-axis and the proportions of the other variable on the Y-axis. If the variable on the Y-axis only has two values, you only need to plot one of them:
|A bar graph for when the nominal variable has only two values.|
If the variable on the Y-axis has more than two values, you should plot all of them. Sometimes pie charts are used for this:
|A pie chart for when the nominal variable has more than two values.|
But as much as I like pie, I think pie charts make it difficult to see small differences in the proportions, and difficult to show confidence intervals. In this situation, I prefer bar graphs:
|A bar graph for when the nominal variable has more than two values.|
There are several tests that use chi-square statistics. The one described here is formally known as Pearson's chi-square. It is by far the most common chi-square test, so it is usually just called the chi-square test.
If the expected numbers in some classes are small, the chi-squared test will give inaccurate results. In that case, you should try Fisher's exact test; if that doesn't work (because the total sample size is too big, or because there are too many values of one of the nominal variables), you can use the
If the samples are not independent, but instead are before-and-after observations on the same individuals, you should use McNemar's test.
The chi-square test gives approximately the same results as the G-test. Unlike the chi-square test, G-values are additive, which means they can be used for more elaborate statistical designs. G-tests are a subclass of likelihood ratio tests, a general category of tests that have many uses for testing the fit of data to mathematical models; the more elaborate versions of likelihood ratio tests don't have equivalent tests using the Pearson chi-square statistic. The G-test is therefore preferred by many, even for simpler designs. On the other hand, the chi-square test is more familiar to more people, and it's always a good idea to use statistics that your readers are familiar with when possible. You may want to look at the literature in your field and see which is more commonly used.
How to do the test
I have set up a spreadsheet that performs this test for up to 10 columns and 50 rows. It is largely self-explanatory; you just enter you observed numbers, and the spreadsheet calculates the chi-squared test statistic, the degrees of freedom, and the P-value.
There are many web pages that do chi-squared tests of independence, but most are limited to fairly small numbers of rows and columns. One page that will handle large data sets is here. Robert Huber has put together a web page that will do a chi-squared test of independence for up to a 10×10 table. Be sure to scroll to the bottom of the page and set the number of rows and columns.
Here is a SAS program that uses PROC FREQ for a chi-square test. It uses the handclasping data from above.
data handclasp; input thumb $ hand $ count; cards; rightthumb righthand 190 leftthumb righthand 149 rightthumb lefthand 42 leftthumb lefthand 49 ; proc freq data=handclasp; weight count / zeros; tables thumb*hand / chisq; run;
The output includes the following:
Statistics for Table of thumb by hand Statistic DF Value Prob ------------------------------------------------------------- Chi-Square 1 2.8265 0.0927 Likelihood Ratio Chi-Square 1 2.8187 0.0932 Continuity Adj. Chi-Square 1 2.4423 0.1181 Cochran–Mantel–Haenszel Chi-Square 1 2.8199 0.0931 Phi Coefficient 0.0811 Contingency Coefficient 0.0808 Cramer's V 0.0811
The "Chi-Square" on the first line is the P-value for the chi-square test; in this case, chi-square=2.8265, 1 d.f., P=0.0927.
For a 2×2 table, you can use the technique described for Fisher's exact test, even if the resulting sample size will be much too large to actually do Fisher's exact test.
For a test with more than 2 rows or columns, the G*Power program will calculate the sample size needed for a test of independence, but you need to calculate the effect size parameter, w, separately. The chi-squared test of independence spreadsheet can be used for this. Enter the data you hope to see; you can enter proportions, percentages, or raw numbers. Then go to G*Power and choose "Chi-squared tests" from the "Test family" menu and "Goodness-of-fit tests: Contingency tables" from the "Statistical test" menu. Copy the "Effect size w" from the spreadsheet to the G*Power form, then enter your alpha (usually 0.05), your power (often 0.8 or 0.9), and your degrees of freedom (for a test with R rows and C columns, remember that degrees of freedom is (R−1)×(C−1)). This analysis assumes that your total sample will be divided equally among the groups; if it isn't, you'll need a larger sample size than the one you estimate.
As an example, let's say you're looking for a relationship between bladder cancer and genotypes at a polymorphism in the catechol-O-methyltransferase gene in humans. In the population you're studying, you know that the genotype frequencies in people without bladder cancer are 0.36 GG, 0.48 GA, and 0.16 AA; you want to know how many people with bladder cancer you'll have to genotype to get a significant result if they have 6 percent more AA genotypes. Enter 0.36, 0.48, and 0.16 in the first column of the spreadsheet, and 0.33, 0.45, and 0.22 in the second column; the effect size (w) is 0.10838. Enter this in the G*Power page, enter 0.05 for alpha, 0.80 for power, and 2 for degrees of freedom. The result is a total sample size of 821, so you'll need 411 people with bladder cancer and 411 people without bladder cancer.
Sokal and Rohlf, pp. 736-737.
Zar, pp. 486-500.
Downey, J.E. 1926. Further observations on the manner of clasping the hands. American Naturalist 60: 387-391.
Gardemann, A., D. Ohly, M. Fink, N. Katz, H. Tillmanns, F.W. Hehrlein, and W. Haberbosch. 1998. Association of the insertion/deletion gene polymorphism of the apolipoprotein B signal peptide with myocardial infarction. Atheroslerosis 141: 167-175.
Young, R.F., and H.E. Winn. 2003. Activity patterns, diet, and shelter site use for two species of moray eels, Gymnothorax moringa and Gymnothorax vicinus, in Belize. Copeia 2003: 44-55.
This page was last revised September 11, 2009. Its address is http://udel.edu/~mcdonald/statchiind.html. It may be cited as pp. 57-63 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.