Basics

Tests for nominal variables

Descriptive statistics

Tests for one measurement variable

Tests for multiple measurement variables

Multiple tests

Miscellany

Getting started with SAS



SAS, SPSS and Stata are some of the most popular software packages for doing serious statistics. I have a little experience with SAS, so I've prepared this web page to get you started on the basics. UCLA's Academic Technology Services department has prepared very useful guides to SAS, SPSS and Stata.

SAS may seem intimidating and old-fashioned; accomplishing anything with it requires writing what is, in essence, a computer program, one where a misplaced semicolon can have disastrous results. But I think that if you take a deep breath and work your way patiently through the examples, you'll soon be able to do some pretty cool statistics.

The instructions here are for the University of Delaware, but most of it should apply anywhere that SAS is installed. There are three ways of using SAS:

SAS runs on a mainframe computer, not your personal computer. You'll have to connect your personal computer to Strauss, one of the University of Delaware's mainframes. The operating system for Strauss is Unix; in order to run SAS on Strauss in batch mode, you'll have to learn a few Unix commands.

Getting connected to Strauss from a Mac

On a Mac, find the program Terminal; it should be in the Utilities folder, inside your Applications folder. You'll probably want to drag it to your taskbar for easy access in the future. The first time you run Terminal, go to Preferences in the Terminal menu, choose Settings, then choose Advanced. Set "Declare terminal as:" to "vt100". Then check the box that says "Delete sends Ctrl-H". (Some versions of Terminal may have the preferences arranged somewhat differently, and you may need to look for a box to check that says "Delete key sends backspace.") Then quit and restart Terminal. You won't need to change these settings again.

When you start up Terminal, you'll get a prompt that looks like this:


Your-Names-Computer:~ yourname$

After the dollar sign, type ssh userid@strauss.udel.edu, where userid is your UDelNet ID, and hit return. It will ask you for your password; enter it. You'll then be connected to Strauss, and you'll get this prompt:


strauss.udel.edu% 

You're now ready to start typing Unix commands.

Getting connected to Strauss from Windows

On a Windows computer, see if the program SSH Secure Shell is on your computer, and if it isn't, download it from UDeploy. (If you're not at Delaware, ask your site administrator which "terminal emulator" they recommend for Windows). Start up the program, then click on "quick connect" and enter strauss.udel.edu for the host name and your UDelNet ID for the username. It will ask you for your password; if you enter it successfully, you'll get this prompt:


strauss.udel.edu% 

You're now ready to start typing Unix commands.

Getting connected to Strauss from Linux

If you're running Linux, you're already enough of a geek that you don't need my help getting connected to the mainframe.

A little bit of Unix

The operating system on Strauss is Unix, so you've got to learn a few Unix commands. Unix was apparently written by people for whom typing is very painful, as most of the commands are a small number of cryptic letters. Case does matter; don't enter CD and think it means the same thing as cd. Here is all the Unix you need to know to run SAS. Commands are in bold and file and directory names, which you choose, are in italics.

ls Lists all of the file names in your current directory.
pico filename pico is a text editor; you'll use it for writing SAS programs. Enter pico practice.sas to open an existing file named practice.sas, or create it if it doesn't exist. To exit pico, enter the control and X keys. You have to use the arrow keys, not the mouse, to move around the text once you're in a file. For this reason, I prefer to create and edit SAS programs in a text editor on my computer (TextEdit on a Mac, NotePad on Windows), then copy and paste them into a file I've created with pico. I then use pico for minor tweaking of the program. Note that there are other popular text editors, such as vi and emacs, and one of the defining characters of a serious computer geek is a strong opinion about the superiority of their favorite text editor and total loserness of all other text editors. To avoid becoming one of them, try not to get emotional about pico.

Unix filenames should be made of letters and numbers, dashes (-), underscores (_), and periods. Don't use spaces or other punctuation (slashes, parentheses, exclamation marks, etc.), as they have special meanings in Unix and may confuse the computer. It is common to use an extension after a period, such as .sas to indicate a SAS program, but that it for your convenience in recognizing what kind of file it is; it isn't required by Unix.
cat filename Opens a file for viewing and printing, but not editing. It will automatically take you to the end of the file, so you'll have to scroll up. To print, you may want to copy what you want, then paste it into a word processor document for easier formatting.
mv oldname newname Changes the name of a file from oldname to newname. When you run SAS on the file practice.sas, the output will be in a file called practice.lst. Before you make changes to practice.sas and run it again, you may want to change the name of practice.lst to something else, so it won't be overwritten.
cp oldname newname Makes a copy of file oldname with the name newname.
rm filename Deletes a file.
logout Logs you out of Strauss.
mkdir directoryname Creates a new directory. You don't need to do this, but if you end up creating a lot of files, you may find it helpful to keep them organized into different directories.
cd directoryname Changes from one directory to another. For example, if you have a directory named sasfiles in your home directory, enter cd sasfiles. To go from within a directory up to your home directory, just enter cd.
rmdir directoryname Deletes a directory, if it doesn't have any files in it. If you want to delete a directory and the files in it, first go into the directory, delete all the files in it using rm, then delete the directory using rmdir.
sas filename Runs SAS. Be sure to enter sas filename.sas. If you just enter sas and then hit return, you'll be in interactive SAS mode, which is scary; enter ;endsas; if that happens and you need to get out of it.

Writing a SAS program

To use SAS, you first use pico to create an empty file; you can call the first one practice.sas. Then you type in the SAS program that you've written and save the file by hitting the control and X keys. Once you've exited pico, you enter sas practice.sas; the word sas is the command that tells Unix to run the SAS program, and practice.sas is the file it is going to run SAS on. SAS then creates a file named practice.log, which reports any errors. If there are no fatal errors, SAS also creates a file named practice.lst, which contains the results of the analysis.

The SAS program (which you write using pico) consists of a series of commands. Each command is one or more words, followed by a semicolon. You can put comments into your program to remind you of what you're trying to do; these comments have a slash and asterisk on each side, like this:


/*This is a comment. It is not read by the SAS program.*/

The SAS program has two basic parts, the DATA step and the PROC step. (Note--I'll capitalize all SAS commands to make them stand out, but you don't have to when you write your programs.) The DATA step reads in data, either from another file or from within the program.

In a DATA step, you first say "DATA dataset;" where dataset is an arbitrary name you give the dataset. Then you say "INPUT variable1 variable2...;" giving an arbitrary name to each of the variables that is on a line in your data. So if you have a data set consisting of the length and width of mussels from two different species, you could start the program by writing:


data mussels;                        
   input species $ length width;      

A variable name for a nominal variable (a name or character) has a space and a dollar sign ($) after it. In our practice data set, "species" is a nominal variable. If you want to treat a number as a nominal variable, such as an ID number, remember to put a dollar sign after the name of the variable. Don't use spaces within variable names; use Medulis or M_edulis, not M. edulis (there are ways of handling variables containing spaces, but they're complicated).

If you are putting the data directly in the program, the next step is a line that says "CARDS;", followed by the data. A semicolon on a line by itself tells SAS it's done reading the data. Each observation is on a separate line, with the variables separated by one or more spaces:


data mussel;
   input species $ length width;
   cards;
edulis 49.0 11.0
trossulus 51.2 9.1
trossulus 45.9 9.4
edulis 56.2 13.2
edulis 52.7 10.7
edulis 48.4 10.4
trossulus 47.6 9.5
trossulus 46.2 8.9
trossulus 37.2 7.1
;

If you have a large data set, it will be more convenient to keep it in a separate file from your program. To read in data from another file, use the INFILE statement, with the name of the data file in single quotes. In this example, I use the FIRSTOBS option to tell SAS that the first observation is on line 2 of the data file, because line 1 has column headings that remind me what the variables are. You don't have to do this, but I find it's a good idea to have one or more lines of explanatory information at the start of a data file; otherwise, it's too easy to forget what all the numbers are.


data mussel;
   infile 'shells.dat' firstobs=2;    
   input species $ length width;     

The DATA statement can create new variables from mathematical operations on the original variables. Here I make two new variables, "loglength," which is just the base-10 log of length, and "shellratio," the width divided by the length. SAS can do statistics on these variables just as it does on the original variables.


data mussel;
   infile 'shells.dat' firstobs=2;    
   input species $ length width;    
   loglength=log10(length);   
   shellratio=width/length;    

The PROC step

Once you've entered in the data, it's time to analyze it. This is done with one or more PROC commands. For example, to calculate the mean and standard deviation of the lengths, widths, and log-transformed lengths, you would use PROC MEANS:


proc means data=mussel mean std;    
   var length width loglength;         
   run;

PROC MEANS tells SAS which procedure to run. It is followed by certain options. DATA=MUSSEL tells it which data set to analyze. MEAN and STD are options that tell PROC MEANS to calculate the mean and standard deviation. On the next line, VAR LENGTH WIDTH LOGLENGTH tells PROC MEANS which variables to analyze. RUN tells it to run.

Now put it all together and run a SAS program. Connect to Strauss and use pico to create a file named "practice.sas". Copy and paste the following into the file:


data mussel;
   input species $ length width;
   loglength=log10(length);
   shellratio=width/length; 
   cards;
edulis 49.0 11.0
tross 51.2 9.1
tross 45.9 9.4
edulis 56.2 13.2
edulis 52.7 10.7
edulis 48.4 10.4
tross 47.6 9.5
tross 46.2 8.9
tross 37.2 7.1
;
proc means data=mussel mean std;
   var length width loglength;
   run;

Then exit pico (hit control-X). At the dollar sign prompt, enter sas practice.sas. Then enter ls to list the file names; you should see new files named practice.log and practice.lst. First, enter cat test.log to look at the log file. This will tell you whether there are any errors in your SAS program. Then enter cat practice.lst to look at the output from your program. You should see something like this:


                The SAS System 

              The MEANS Procedure

     Variable             Mean         Std Dev
     -----------------------------------------
     length         48.2666667       5.2978769
     width           9.9222222       1.6909892
     loglength       1.6811625       0.0501703
     -----------------------------------------

If you do, you've successfully run SAS. Yay!

PROC SORT and PROC PRINT

Specific statistical procedures are described on the web page for each test. Two that are of general use are PROC SORT and PROC PRINT. PROC SORT sorts the data by one or more variables. For some procedures, you need to sort the data first. PROC PRINT writes the data set, including any new variables you've created (like loglength and shellratio in our example) to the output file. You can use it to make sure that SAS has read the data correctly, and your transformations, sorting, etc. have worked properly. You can sort the data by more than one variable; this example sorts the mussel data, first by species, then by length.


proc sort data=mussel;
   by species length;
   run;
proc print data=mussel;
   run;

Adding PROC SORT and PROC PRINT to the SAS file produces the following output:


                       The SAS System 

    Obs    species    length    width    loglength    shellratio

     1     edulis      48.4      10.4     1.68485       0.21488 
     2     edulis      49.0      11.0     1.69020       0.22449 
     3     edulis      52.7      10.7     1.72181       0.20304 
     4     edulis      56.2      13.2     1.74974       0.23488 
     5     trossulus   37.2       7.1     1.57054       0.19086 
     6     trossulus   45.9       9.4     1.66181       0.20479 
     7     trossulus   46.2       8.9     1.66464       0.19264 
     8     trossulus   47.6       9.5     1.67761       0.19958 
     9     trossulus   51.2       9.1     1.70927       0.17773

As you can see, the data were sorted first by species, then within each species, they were sorted by length.

Graphs in SAS

It's possible to draw graphs with SAS, but I don't find it to be very easy. I recommend you take whatever numbers you need from SAS, put them into a spreadsheet or specialized graphing program, and use that to draw your graphs.

Getting data from a spreadsheet into SAS

I find it easiest to enter my data into a spreadsheet first, even if I'm going to analyze it using SAS. If you try to copy data directly from a spreadsheet into a SAS file, the numbers will be separated by tabs, which SAS will choke on; your log file will say "NOTE: Invalid data in line...". One way to fix this is to copy the data from the spreadsheet into a text editor (TextEdit on a Mac, Notepad on Windows), then do a search-and-replace to change all the tabs to spaces. You can then copy from the text editor and paste into the file you've opened on Strauss with Pico. Another way to get rid of the tabs is to use the Save As... command in the spreadsheet program and save the spreadsheet as Space-delimited Text. After that, you open the file with TextEdit or Notepad, copy it, and paste it into your file on Strauss.

If you're going to keep your data in a separate file from the SAS program and read it using an INFILE statement, you can use the DELIMITER command to tell it that the values are separated by tabs. Here I've made a file named SHELLS.DAT using a spreadsheet, in which the values are separated by tabs (represented as '09'x in SAS):


data mussel;
   infile 'shells.dat' delimiter='09'x;    
   input species $ length width;

If you have data separated by some other character, just put it in single quotation marks, such as DELIMITER='!' for data separated by exclamation marks.

More information about SAS

The user manuals for SAS are available online for free, which is nice. Unfortunately, they're in "frames" format, which makes it impossible to link to specific pages, so you won't see links to the appropriate topics in the manual in this handbook.

The UCLA Academic Technology Services has put together an excellent set of examples of how to do the most common statistical tests in SAS, SPSS or Stata; it's a good place to start if you're looking for more information about a particular test.



Return to the Biological Data Analysis syllabus

Return to John McDonald's home page

This page was last revised September 14, 2009. Its address is http://udel.edu/~mcdonald/statsasintro.html. It may be cited as pp. 300-307 in: McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.

©2009 by John H. McDonald. You can probably do what you want with this content; see the permissions page for details.