When to use it
You use the Wilcoxon signed-rank test when there are two nominal variables and one measurement variable. One of the nominal variables has only two values, such as "before" and "after," and the other nominal variable often represents individuals. This is the non-parametric analogue to the paired t-test, and should be used if the distribution of differences between pairs may be non-normally distributed.
The null hypothesis is that the median difference between pairs of observations is zero. Note that this is different from the null hypothesis of the paired t-test, which is that the mean difference between pairs is zero, or the null hypothesis of the sign test, which is that the numbers of differences in each direction are equal.
How it works
The absolute value of the differences between observations are ranked from smallest to largest, with the smallest difference getting a rank of 1, then next larger difference getting a rank of 2, etc. Ties are given average ranks. The ranks of all differences in one direction are summed, and the ranks of all differences in the other direction are summed. The smaller of these two sums is the test statistic, W (sometimes symbolized Ts). Unlike most test statistics, smaller values of W are less likely under the null hypothesis.
Laureysens et al. (2004) measured metal content in the wood of 13 poplar clones growing in a polluted area, once in August and once in November. Concentrations of aluminum (in micrograms of Al per gram of wood) are shown below.
Clone Aug Nov Balsam Spire 8.1 11.2 Beaupre 10.0 16.3 Hazendans 16.5 15.3 Hoogvorst 13.6 15.6 Raspalje 9.5 10.5 Unal 8.3 15.5 Columbia River 18.3 12.7 Fritzi Pauley 13.3 11.1 Trichobel 7.9 19.9 Gaver 8.1 20.4 Gibecq 8.9 14.2 Primo 12.6 12.7 Wolterson 13.4 36.8
There are two nominal variables: time of year (August or November) and poplar clone (Balsam Spire, Beaupre, etc.), and one measurement variable (micrograms of aluminum per gram of wood). There are not enough observations to confidently test whether the differences between August and November are normally distributed, but they look like they might be a bit skewed; the Wolterson clone, in particular, has a much larger difference than any other clone. To be safe, the authors analyzed the data using a signed-rank test. The median change from August to November (3.1 micrograms Al/g wood) is significantly different from zero (W=16, P=0.040).
Buchwalder and Huber-Eicher (2004) wanted to know whether turkeys would be less aggressive towards unfamiliar individuals if they were housed in larger pens. They tested 10 groups of three turkeys that had been reared together, introducing an unfamiliar turkey and then counting the number of times it was pecked during the test period. Each group of turkeys was tested in a small pen and in a large pen. There are two nominal variables, size of pen (small or large) and the group of turkeys, and one measurement variable (number of pecks per test). The median difference between the number of pecks per test in the small pen vs. the large pen was significantly greater than zero (W=10, P=0.04).
Ho et al. (2004) inserted a plastic implant into the soft palate of 12 chronic snorers to see if it would reduce the volume of snoring. Snoring loudness was judged by the sleeping partner of the snorer on a subjective 10-point scale. There are two nominal variables, time (before the operations or after the operation) and individual snorer, and one measurement variable (loudness of snoring). One person left the study, and the implant fell out of the palate in two people; in the remaining nine people, the median change in snoring volume was significantly different from zero (W=0, P=0.008).
Graphing the results
You should graph the data for a signed rank test the same way you would graph the data for a paired t-test, a bar graph with either the values side-by-side for each pair, or the differences at each pair.
Paired observations of a measurement variable may be analyzed using a paired t-test, if the null hypothesis is that the mean difference between pairs of observations is zero and the differences are normally distributed. If you have a large number of paired observations, you can plot a histogram of the differences to see if they look normally distributed. I do not know how severe the deviation from normality has to be to make the paired t-test inappropriate.
The sign test is used when the null hypothesis is that there are equal number of differences in each direction.
How to do the test
I have prepared a spreadsheet to do the Wilcoxon signed-rank test. It will handle up to 1000 pairs of observations.
There is a web page that will perform the Wilcoxon signed-rank test. You may enter your paired numbers directly onto the web page; it will be easier if you enter them into a spreadsheet first, then copy them and paste them into the web page.
To do Wilcoxon signed-rank test in SAS, you first create a new variable that is the difference between the two observations. You then run PROC UNIVARIATE on the difference, which automatically does the Wilcoxon signed-rank test along with several others. Here's an example using the poplar data from above:
data poplars; input clone $ aug_al nov_al; diff=aug_al - nov_al; cards; Balsam_Spire 8.1 11.2 Beaupre 10.0 16.3 Hazendans 16.5 15.3 Hoogvorst 13.6 15.6 Raspalje 9.5 10.5 Unal 8.3 15.5 Columbia_River 18.3 12.7 Fritzi_Pauley 13.3 11.1 Trichobel 7.9 19.9 Gaver 8.1 20.4 Gibecq 8.9 14.2 Primo 12.6 12.7 Wolterson 13.4 36.8 ; proc univariate data=poplars; var diff; run;
PROC UNIVARIATE returns a bunch of descriptive statistics that you don't need; the result of the Wilcoxon signed-rank test is shown in the row labelled "Signed rank":
Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t -2.3089 Pr > |t| 0.0396 Sign M -3.5 Pr >= |M| 0.0923 Signed Rank S -29.5 Pr >= |S| 0.0398
Sokal and Rohlf, pp. 440-444.
Zar, pp. 165-169.
Buchwalder, T., and B. Huber-Eicher. 2004. Effect of increased floor space on aggressive behaviour in male turkeys (Melagris gallopavo). Appl. Anim. Behav. Sci. 89: 207-214.
Ho, W.K., W.I. Wei, and K.F. Chung. 2004. Managing disturbing snoring with palatal implants: a pilot study. Arch. Otolaryngology Head and Neck Surg. 130: 753-758.
Laureysens, I., R. Blust, L. De Temmerman, C. Lemmens and R. Ceulemans. 2004. Clonal variation in heavy metal accumulation and biomass production in a poplar coppice culture. I. Seasonal variation in leaf, wood and bark concentrations. Environ. Pollution 131: 485-494.
This page was last revised September 6, 2009. Its address is http://udel.edu/~mcdonald/statsignedrank.html. It may be cited as pp. 198-201 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.