When to use it
Spearman rank correlation is used when you have two measurement variables and one "hidden" nominal variable. The nominal variable groups the measurements into pairs; if you've measured height and weight of a bunch of people, "individual name" is a nominal variable. You want to see whether the two measurement variables covary; whether, as one variable increases, the other variable tends to increase or decrease. It is the non-parametric alternative to correlation, and it is used when the data do not meet the assumptions about normality, homoscedasticity and linearity. Spearman rank correlation is also used when one or both of the variables consists of ranks.
You will rarely have enough data in your own data set to test the normality and homoscedasticity assumptions of regression and correlation; your decision about whether to do linear regression and correlation or Spearman rank correlation will usually depend on your prior knowledge of whether the variables are likely to meet the assumptions.
The null hypothesis is that the ranks of one variable do not covary with the ranks of the other variable; in other words, as the ranks of one variable increase, the ranks of the other variable are not more likely to increase (or decrease).
How the test works
Spearman rank correlation works by converting each variable to ranks. Thus, if you you're doing a Spearman rank correlation of blood pressure vs. body weight, the lightest person would get a rank of 1, second-lightest a rank of 2, etc. The lowest blood pressure would get a rank of 1, second lowest a rank of 2, etc. If one or both variables is already ranks, they remain unchanged, of course. When two or more observations are equal, the average rank is used. For example, if two observations are tied for the second-highest rank, they would get a rank of 2.5 (the average of 2 and 3).
Once the two variables are converted to ranks, a correlation analysis is done on the ranks. The correlation coefficient is calculated for the two columns of ranks, and the significance of this is tested in the same way as the correlation coefficient for a regular correlation. (This Spearman's correlation coefficient is also called Spearman's rho). The P-value from the correlation of ranks is the P-value of the Spearman rank correlation. The ranks are rarely graphed against each other, and a line is rarely used for either predictive or illustrative purposes, so you don't calculate a non-parametric equivalent of the regression line.
Males of the magnificent frigatebird (Fregata magnificens) have a large red throat pouch. They visually display this pouch and use it to make a drumming sound when seeking mates. Madsen et al. (2004) wanted to know whether females, who presumably choose mates based on their pouch size, could use the pitch of the drumming sound as an indicator of pouch size. The authors estimated the volume of the pouch and the fundamental frequency of the drumming sound in 18 males:
Volume, cm^3 Frequency, Hz 1760 529
====See the web page for the full data set====
There are two measurement variables, pouch size and pitch; the identity of each male is the hidden nominal variable. The authors analyzed the data using Spearman rank correlation, which converts the measurement variables to ranks, and the relationship between the variables is significant (Spearman's rho=-0.76, 16 d.f., P=0.0002). The authors do not explain why they used Spearman rank correlation; if they had used regular correlation, they would have obtained r=-0.82, P=0.00003.
Graphing the results
If you have measurement data for both of the X and Y variables, you could plot the results the same way you would for a linear regression. Don't put a regression line on the graph, however; you can't plot a rank correlation line on a graph with measurement variables on the axes, and it would be misleading to put a linear regression line on a graph when you've analyzed it with rank correlation.
If you actually have true ranked data for both variables, you could plot a line through them, I suppose. I'm not sure what the point would be, however.
How to do the test
I've put together a spreadsheet that will perform a Spearman rank correlation on up to 1000 observations. With small numbers of observations (10 or fewer), the P-value based on the equation using r2 is inaccurate, so the spreadsheet looks up the P-value in a table of critical values.
This web pagewill do Spearman rank correlation.
Use PROC CORR with the SPEARMAN option to do Spearman rank correlation. Here is an example using the bird data from the correlation and regression web page:
proc corr data=birds spearman; var species latitude; run;
The results include the Spearman correlation coefficient, analagous to the r-value of a regular correlation, and the P-value:
Spearman Correlation Coefficients, N = 17 Prob > |r| under H0: Rho=0 species latitude species 1.00000 -0.36263 Spearman correlation coefficient 0.1526 P-value latitude -0.36263 1.00000 0.1526
Sokal and Rohlf, pp. 598, 600.
Zar, pp. 395-398.
Madsen, V., T.J.S. Balsby, T. Dabelsteen, and J.L. Osorno. 2004. Bimodal signaling of a sexually selected trait: gular pouch drumming in the magnificent frigatebird. Condor 106: 156-160.
This page was last revised September 2, 2009. Its address is http://udel.edu/~mcdonald/statspearman.html. It may be cited as pp. 221-223 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.