When to use it
You use the sign test when there are two nominal variables and one measurement variable or ranked variable. One of the nominal variables has only two values, such as "before" and "after" or "left" and "right," and the other nominal variable identifies the pairs of observations. The data could be analyzed using a paired t-test or a Wilcoxon signed-rank test, if the null hypothesis is that the mean or median difference between pairs of observations is zero. The sign test is used to test the null hypothesis that there are equal numbers of differences in each direction.
One situation in which a sign test is appropriate is when the biological null hypothesis is that there may be large differences between pairs of observations, but they are random in direction. For example, let's say you want to know whether copper pennies will reduce the number of mosquito larvae in backyard ponds. You measure the abundance of larvae in your pond, add some pennies, then measure the abundance of larvae again a month later. You do this for several other backyard ponds, with the before- and after-pennies measurements at different times in the summer for each pond. Based on prior research, you know that mosquito larvae abundance varies a lot throughout the summer, due to variation in the weather and random fluctuations in the number of adult mosquitoes that happen to find a particular pond; even if the pennies have no effect, you expect big differences in the abundance of larvae between the before and after samples. The random fluctuations in abundance would be random in direction, however, so if the pennies have no effect, you'd expect half the ponds to have more larvae before adding the pennies, and half the ponds to have more larvae after adding pennies.
To see why a paired t-test would be inappropriate for the mosquito experiment, imagine that you've done the experiment in a neighborhood with 100 backyard ponds. Due to changes in the weather, etc., the abundance of mosquito larvae increases in half the ponds and decreases in half the ponds; in other words, the probability that a random pond will decrease in mosquito larvae abundance is 0.5. If you do the experiment on four ponds picked at random, and all four happen show the same direction of difference (all four increase or all four decrease) even though the pennies really have no effect, you'll probably get a significant paired t-test. However, the probability that all four ponds will show the same direction of change is 2×0.54, or 0.125. Thus you'd get a "significant" P-value from the paired t-test 12.5% of the time, which is much higher than the P<0.05 you want.
The other time you'd use a sign test is when you don't know the size of the difference, only its direction; in other words, you have a ranked variable with only two values, "greater" and "smaller." For example, let's say you're comparing the abundance of adult mosquitoes between your front yard and your back yard. You stand in your front yard for five minutes, swatting at every mosquito that lands on you, and then you stand in your back yard for five minutes. You intend to count every mosquito that lands on you, but they are so abundant that soon you're dancing around, swatting yourself wildly, with no hope of getting an accurate count. You then repeat this in your back yard and rate the mosquito abundance in your back yard as either "more annoying" or "less annoying" than your front yard. You repeat this on several subsequent days. You don't have any numbers for mosquito abundance, but you can do a sign test and see whether there are significantly more times where your front yard has more mosquitoes than your back yard, or vice versa.
The null hypothesis is that an equal number of pairs of observations have a change in each direction. If the pairs are "before" and "after," the null hypothesis would be that the number of pairs showing an increase equals the number showing a decrease.
Note that this is different from the null hypothesis tested by a paired t-test, which is that the mean difference between pairs is zero. The difference would be illustrated by a data set in which 19 pairs had an increase of 1 unit, while one pair had a decrease of 19 units. The 19: 1 ratio of increases to decreases would be highly significant under a sign test, but the mean change would be zero.
Farrell et al. (2001) estimated the evolutionary tree of two subfamilies of beetles that burrow inside trees as adults. They found ten pairs of sister groups in which one group of related species, or "clade," fed on angiosperms and one fed on gymnosperms, and they counted the number of species in each clade. There are two nominal variables, food source (angiosperms or gymnosperms) and pair of clades (Corthylina vs. Pityophthorus, etc.) and one measurement variable, the number of species per clade.
The biological null hypothesis is that although the number of species per clade may vary widely due to a variety of unknown factors, whether a clade feeds on angiosperms or gymnosperms will not be one of these factors. In other words, you expect that each pair of related clades will differ in number of species, but half the time the angiosperm-feeding clade will have more species, and half the time the gymnosperm-feeding clade will have more species.
Applying a sign test, there are 10 pairs of clades in which the angiosperm-specialized clade has more species, and 0 pairs with more species in the gymnosperm-specialized clade; this is significantly different from the null expectation (P=0.002), and you can reject the null hypothesis and conclude that in these beetles, clades that feed on angiosperms tend to have more species than clades that feed on gymnosperms.
Angiosperm-feeding Spp. Gymonsperm-feeding Spp. Corthylina 458 Pityophthorus 200 Scolytinae 5200 Hylastini + Tomacini 180 Acanthotomicus + Premnobious 123 Orhotomicus 11 Xyleborini/Dryocoetini 1500 Ipini 195 Apion 1500 Antliarhininae 12 Belinae 150 Allocoryninae + Oxycorinae 30 Higher Curculionidae 44002 Nemonychidae 85 Higher Cerambycidae 25000 Aseminae + Spondylinae 78 Megalopodinae 400 Palophaginae 3 Higher Chrysomelidae 33400 Aulocoscelinae + Orsodacninae 26
Sherwin (2004) wanted to know whether laboratory mice preferred having a mirror in their cage. He set up 16 pairs of connected cages, one with a mirror and one without, and put a solitary mouse in each pair of cages. He then measured the amount of time each mouse spent in each of its two cages. There are two nominal variables, mirror (present or absent) and the individual mouse, and one measurement variable, the time spent in each cage. Three of the 16 mice spent more time in the cage with a mirror, and 13 mice spent more time in the cage without a mirror. The result or a sign test is P=0.021, so you can reject the null hypothesis that the number of mice that prefer a mirror equals the number of mice that prefer not having a mirror.
McDonald (1991) counted allele frequencies at the mannose-6-phosphate (MPI) locus in the amphipod crustacean Orchestia grillus from six bays on the north shore of Long Island, New York. At each bay two sites were sampled, one outside the bay ("exposed") and one inside the bay ("protected"). There are three nominal variables: allele ("fast" or "slow"), habitat ("exposed" or "protected"), and bay. The allele frequencies at each bay were converted to a ranked variable with two values: Mpifast more common at the exposed site than the protected site, or Mpifast less common at the exposed site. At all six bays, Mpifast was less common at the exposed site, which is significant by a sign test (P=0.03).
Note that this experimental design is identical to the study of Lap allele frequencies in the mussel Mytilus trossulus inside and outside of Oregon estuaries that was used as an example for the Cochran–Mantel–Haenszel test. Although the experimental designs are the same, the biological questions are different, which makes the Cochran–Mantel–Haenszel test appropriate for the mussels and the sign test appropriate for the amphipods.
Two evolutionary processes can cause allele frequencies to be different between different locations, natural selection or random genetic drift. Mussels have larvae that float around in the water for a few weeks before settling onto rocks, so I considered it very unlikely that random genetic drift would cause a difference in allele frequencies between locations just a few kilometers apart. Therefore the biological null hypothesis is that the absence of natural selection keeps the allele frequencies the same inside and outside of estuaries; any difference in allele frequency between marine and estuarine habitats would be evidence for natural selection. The Cochran–Mantel–Haenszel test is a test of the statistical null hypothesis that the allele frequencies are the same in the two habitats, so the significant result is evidence that Lap in the mussels is affected by natural selection.
The amphipod Orchestia grillus does not have larvae that float in the water; the female amphipods carry the young in a brood pouch until they're ready to hop around on their own. The amphipods live near the high tide line in marshes, so the exposed and protected sites are likely to be well isolated populations with little migration between them. Therefore differences between exposed and protected sites due to random genetic drift are quite likely, and it wouldn't have been very interesting to find them. Random genetic drift, however, is random in direction; if it were the only process affecting allele frequencies, the Mpifast allele would be more common inside half the bays and less common inside half the bays. The significant sign test indicates that the direction of difference in allele frequency is not random, so the biological null hypothesis of differences due to random drift can be rejected and the alternative hypothesis of differences due to natural selection can be accepted.
Graphing the results
You should graph the data for a sign test the same way you would graph the data for a paired t-test, a bar graph with either the values side-by-side for each pair, or the differences at each pair.
Paired observations of a measurement variable may be analyzed using a paired t-test, if the null hypothesis is that the mean difference between pairs of observations is zero, or a Wilcoxon signed-rank test, if the null hypothesis is that the median difference between pairs of observations is zero. The sign test is used when the null hypothesis is that there are equal number of differences in each direction.
How to do the test
First, count the number of pairs of observations with an increase (plus signs) and the number of pairs with a decrease (minus signs). Ignore pairs with no difference. Compare the ratio of plus signs: minus signs to the expected 1:1 ratio using the exact binomial test spreadsheet.
You can use Richard Lowry's exact binomial test web page to do a sign test, once you've counted the number of differences in each direction by hand.
PROC UNIVARIATE automatically does a sign test; see the example on the Wilcoxon signed-rank web page.
Because the sign test is just an application of the exact binomial test, you can use the sample size calculator for the exact binomial test.
Sokal and Rohlf, pp. 444-445.
Zar, pp. 538-539.
Sokal and Rohlf 1995, pp. 444-445.
Farrell, B.D., A.S. Sequeira, B.C. O'Meara, B.B. Normark, J.H. Chung, and B.H. Jordal. 2001. The evolution of agriculture in beetles (Curculionidae: Scolytinae and Platypodinae). Evolution 55: 2011-2027.
McDonald, J.H. 1991. Contrasting amounts of geographic variation as evidence for direct selection: the Mpi and Pgm loci in eight crustacean species. Heredity 67:215-219.
Sherwin, C.M. 2004. Mirrors as potential environmental enrichment for individually housed laboratory mice. Appl. Anim. Behav. Sci. 87: 95-103.
This page was last revised September 14, 2009. Its address is http://udel.edu/~mcdonald/statsign.html. It may be cited as pp. 202-206 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.