# Student's t-test

Any statistical test that uses the t-distribution can be called a t-test. One of the most common is Student's t-test, named after "Student," the pseudonym that William Gosset used to hide his employment by the Guinness brewery in the early 1900s (they didn't want their competitors to know that they were making better beer with statistics). Student's t-test is used to compare the means of two samples. Other t-tests include tests to compare a single observation to a sample, or to compare a sample mean to a theoretical mean (I won't cover either of these, as they are not used very often in biology), and the paired t-test.

### When to use it

Use Student's t-test when you have one nominal variable and one measurement variable, and you want to compare the mean values of the measurement variable. The nominal variable must have only two values, such as "male" and "female" or "treated" and "untreated."

### Null hypothesis

The statistical null hypothesis is that the means of the measurement variable are equal for the two categories.

### How the test works

The test statistic, ts, is calculated using a formula that has the difference between the means in the numerator; this makes ts get larger as the means get further apart. The denominator is the standard error of the difference in the means, which gets smaller as the sample variances decrease or the sample sizes increase. Thus ts gets larger as the means get farther apart, the variances get smaller, or the sample sizes increase.

The probability of getting the observed ts value under the null hypothesis is calculated using the t-distribution. The shape of the t-distribution, and thus the probability of getting a particular ts value, depends on the number of degrees of freedom. The degrees of freedom for a t-test is the total number of observations in the groups minus 2, or n1+n2-2.

### Assumptions

The t-test assumes that the observations within each group are normally distributed and the variances are equal in the two groups. It is not particularly sensitive to deviations from these assumptions, but if the data are very non-normal, the Mann-Whitney U-test can be used. Welch's t-test can be used if the variances are unequal.

### Example

In fall 2004, students in the 2 p.m. section of my Biological Data Analysis class had an average height of 66.6 inches, while the average height in the 5 p.m. section was 64.6 inches. Are the average heights of the two sections significantly different? Here are the data:

```
2 p.m. 5 p.m.
69     68
70     62
66     67
63     68
68     69
70     67
69     61
67     59
62     62
63     61
76     69
59     66
62     62
62     62
75     61
62     70
72
63

```

There is one measurement variable, height, and one nominal variable, class section. The null hypothesis is that the mean heights in the two sections are the same. The results of the t-test (t=1.29, 32 d.f., P=0.21) do not reject the null hypothesis.

### Graphing the results

Because it's just comparing two numbers, you'd rarely put the results of a t-test in a graph for publication. For a presentation, you could draw a bar graph like the one for a one-way anova.

### Similar tests

Student's t-test is mathematically identical to a one-way anova done on data with two categories. The t-test is easier to do and is familiar to more people, but it is limited to just two categories of data. The anova can be done on two or more categories. I recommend that if your research always involves comparing just two means, you should use the t-test, because it is more familiar to more people. If you write a paper that includes some comparisons of two means and some comparisons of more than two means, you may want to call all the tests one-way anovas, rather than switching back and forth between two different names (t-test and one-way anova) for what is essentially the same thing.

If the data are not normally distributed, and they can't be made normal using data transformations, it may be better to compare the ranks using a Mann-Whitney U-test. Student's t-test is not very sensitive to deviations from the normal distribution, so unless the non-normality is really dramatically obvious, you can use the t-test.

If the variances are far from equal, you can use Welch's t-test for unequal variances; you can do it in a spreadsheet using "=TTEST(array1, array2, tails, type)" by entering "3" for "type" instead of "2". You can also do Welch's t-test using this web page. The spreadsheet described on the homoscedasticity page can help you decide whether the difference in variances is be enough that you should use Welch's t-test.

The paired t-test is used when the measurement observations come in pairs, such as comparing the strengths of the right arm with the strength of the left arm on a set of people.

### How to do the test

The easiest way to do the test is with the TTEST function. This takes the form "=TTEST(array1, array2, tails, type)". "Array1" is the set of cells with the measurement variables from your first class of observations, and "array2" is the set of cells with your second class of observations. "Tails" is either 1 (for a one-tailed test) or 2 (for a two-tailed test). You'll almost always want to do a two-tailed test. To do a regular t-test, enter "2" for the "type." The function returns the P-value of the test.

For the above height data, enter the first column of numbers in cells A2 through A23, and the second column of numbers in cells B2 through B18. In an empty cell, enter "=TTEST(A2:A23, B2:B18, 2, 2)". The result is P=0.207, so the difference in means is not significant.

If you want to report the t-value, there is no spreadsheet function to directly calculate it; you'll have to use SAS or one of the web pages linked below. The degrees of freedom is just the total number of observations minus 2.

#### Web pages

There are web pages to do the t-test here, here, here, and here.

#### SAS

You can use PROC TTEST for Student's t-test; the CLASS parameter is the nominal variable, and the VAR parameter is the measurement variable. Here is an example program for the height data above.

```
data sectionheights;
input section \$ height;
cards;
2pm 69
2pm 70
2pm 66
2pm 63
2pm 68
2pm 70
2pm 69
2pm 67
2pm 62
2pm 63
2pm 76
2pm 59
2pm 62
2pm 62
2pm 75
2pm 62
2pm 72
2pm 63
5pm 68
5pm 62
5pm 67
5pm 68
5pm 69
5pm 67
5pm 61
5pm 59
5pm 62
5pm 61
5pm 69
5pm 66
5pm 62
5pm 62
5pm 61
5pm 70
proc ttest;
class section;
var height;
run;

```
```
data sectionheights;
input section \$ height;
cards;
2pm 69
2pm 70
====See the web page for the full data set====
5pm 61
5pm 70
proc ttest;
class section;
var height;
run;

```

The output includes a lot of information; the P-value for the Student's t-test is under "Pr > |t| on the line labelled "Pooled". For these data, the P-value is 0.2067.

```
Variable        Method     Variances     DF    t Value    Pr > |t|

height          Pooled         Equal     32       1.29      0.2067
height   Satterthwaite       Unequal   31.2       1.31      0.1995

```

### Power analysis

To estimate the sample sizes needed to detect a significant difference between two means, you need the following:

• the effect size, or the difference in means you hope to detect;
• the standard deviation. Usually you'll use the same value for each group, but if you know ahead of time that one group will have a larger standard deviation than the other, you can use different numbers;
• alpha, or the significance level (usually 0.05);
• beta, the probability of accepting the null hypothesis when it is false (0.50, 0.80 and 0.90 are common values);
• the ratio of one sample size to the other. The most powerful design is to have equal numbers in each group (N1/N2=1.0), but sometimes it's easier to get large numbers of one of the groups. For example, if you're comparing the bone strength in mice that have been reared in zero gravity aboard the International Space Station vs. control mice reared on earth, you might decide ahead of time to use three control mice for every one space mouse (N1/N2=3.0)

As an example, let's say you're planning a clinical trial of Niaspan in people with low levels of HDL (the "good cholesterol"). You're going to take a bunch of people with low HDL, give half of them Niaspan, and give the rest of them a placebo. The average HDL level before the trial is 32 mg/dl, and you decide you want to detect a difference of 10 percent (3.2 mg/dl), at the P<0.05 level, with a probability of detecting a difference this large, if it exists, of 80 percent (1−beta=0.80). Based on prior research, you estimate the standard deviation as 4.3 mg/dl in each group.

On the form belowon the web page, enter 3.2 for "Difference in means", 4.3 for both of the "Standard deviation" values, 0.05 for the alpha, 0.80 for the power, and 1.0 for N1/N2. The result is 29, so you'll need a minimum of 29 people in your placebo group and 29 in the Niaspan group.

Difference in means:
Standard deviation of sample 1:
Standard deviation of sample 2:
Alpha (significance level of test):
Power (probability of significant result if alternative hypothesis is true):
Sample size ratio (N1/N2):
Two-tailed test     One-tailed test

Sokal and Rohlf, pp. 223-227.

Zar, pp. 122-129.