# Spearman rank correlation

### When to use it

Spearman rank correlation is used when you have two measurement variables and one "hidden" nominal variable. The nominal variable groups the measurements into pairs; if you've measured height and weight of a bunch of people, "individual name" is a nominal variable. You want to see whether the two measurement variables covary; whether, as one variable increases, the other variable tends to increase or decrease. It is the non-parametric alternative to correlation, and it is used when the data do not meet the assumptions about normality, homoscedasticity and linearity. Spearman rank correlation is also used when one or both of the variables consists of ranks.

You will rarely have enough data in your own data set to test the normality and homoscedasticity assumptions of regression and correlation; your decision about whether to do linear regression and correlation or Spearman rank correlation will usually depend on your prior knowledge of whether the variables are likely to meet the assumptions.

### Null hypothesis

The null hypothesis is that the ranks of one variable do not covary with the ranks of the other variable; in other words, as the ranks of one variable increase, the ranks of the other variable are not more likely to increase (or decrease).

### How the test works

Spearman rank correlation works by converting each variable to ranks. Thus, if you you're doing a Spearman rank correlation of blood pressure vs. body weight, the lightest person would get a rank of 1, second-lightest a rank of 2, etc. The lowest blood pressure would get a rank of 1, second lowest a rank of 2, etc. If one or both variables is already ranks, they remain unchanged, of course. When two or more observations are equal, the average rank is used. For example, if two observations are tied for the second-highest rank, they would get a rank of 2.5 (the average of 2 and 3).

Once the two variables are converted to ranks, a correlation analysis is done on the ranks. The correlation coefficient is calculated for the two columns of ranks, and the significance of this is tested in the same way as the correlation coefficient for a regular correlation. (This Spearman's correlation coefficient is also called Spearman's rho). The P-value from the correlation of ranks is the P-value of the Spearman rank correlation. The ranks are rarely graphed against each other, and a line is rarely used for either predictive or illustrative purposes, so you don't calculate a non-parametric equivalent of the regression line.

### Example

Males
of the magnificent frigatebird (*Fregata magnificens*) have a large
red throat pouch. They visually display this pouch and use it to make a
drumming sound when seeking mates. Madsen et al. (2004) wanted to know
whether females, who presumably choose mates based on their pouch size,
could use the pitch of the drumming sound as an indicator of pouch size.
The authors estimated the volume of the pouch and the fundamental
frequency of the drumming sound in 18 males:

Volume, cm^3 Frequency, Hz 1760 529

** ====See the web page for the full data set====**

7960 416

There are two measurement variables, pouch size and pitch; the identity of each male is the hidden nominal variable. The authors analyzed the data using Spearman rank correlation, which converts the measurement variables to ranks, and the relationship between the variables is significant (Spearman's rho=-0.76, 16 d.f., P=0.0002). The authors do not explain why they used Spearman rank correlation; if they had used regular correlation, they would have obtained r=-0.82, P=0.00003.

### Graphing the results

If you have measurement data for both of the X and Y variables, you could plot the results the same way you would for a linear regression. Don't put a regression line on the graph, however; you can't plot a rank correlation line on a graph with measurement variables on the axes, and it would be misleading to put a linear regression line on a graph when you've analyzed it with rank correlation.

If you actually have true ranked data for both variables, you could plot a line through them, I suppose. I'm not sure what the point would be, however.

### How to do the test

#### Spreadsheet

I've put together a
spreadsheet that will perform a Spearman rank correlation on up to
1000 observations. With small numbers of observations (10 or fewer), the P-value based on the equation using r^{2} is inaccurate, so the spreadsheet looks up the P-value in a table of critical values.

#### Web page

This web pagewill do Spearman rank correlation.

#### SAS

Use PROC CORR with the SPEARMAN option to do Spearman rank correlation. Here is an example using the bird data from the correlation and regression web page:

proc corr data=birds spearman; var species latitude; run;

The results include the Spearman correlation coefficient, analagous to the r-value of a regular correlation, and the P-value:

Spearman Correlation Coefficients, N = 17 Prob > |r| under H0: Rho=0 species latitude species 1.00000 -0.36263Spearman correlation coefficient0.1526P-valuelatitude -0.36263 1.00000 0.1526

### Further reading

Sokal and Rohlf, pp. 598, 600.

Zar, pp. 395-398.

### References

Madsen, V., T.J.S. Balsby, T. Dabelsteen, and J.L. Osorno. 2004. Bimodal signaling of a sexually selected trait: gular pouch drumming in the magnificent frigatebird. Condor 106: 156-160.

### ⇐ Previous topic | Next topic ⇒

This page was last revised September 2, 2009. Its address is http://udel.edu/~mcdonald/statspearman.html. It may be cited as pp. 221-223 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.