Getting started with SAS
SAS, SPSS and Stata are the most popular software packages for doing serious statistics. I have a little experience with SAS, so I've prepared this web page to get you started on the basics. UCLA's Academic Technology Services department has prepared very useful guides to SAS, SPSS and Stata.
SAS may seem intimidating and old-fashioned; accomplishing anything with it requires writing what is, in essence, a computer program, one where a misplaced semicolon can have disastrous results. But I think that if you take a deep breath and work your way patiently through the examples, you'll soon be able to do some pretty cool statistics.
The instructions here are for the University of Delaware, but most of it should apply anywhere that SAS is installed. There are three ways of using SAS:
- in batch mode. This is what I recommend, and this is what I'll describe below.
- interactively in line mode. I don't recommend this.
- interactively with the Display Manager System. From what I've seen, this isn't very easy. If you really want to try it, here are instructions. Keep in mind that "interactive" doesn't mean "user friendly graphical interface like you're used to"; you still have to write the same SAS programs.
SAS runs on a mainframe computer, not your personal computer. You'll have to connect to Strauss, one of the University of Delaware's mainframes. The operating system for Strauss is Unix; in order to run SAS on Strauss in batch mode, you'll have to learn a few Unix commands.
Getting connected to Strauss from a Mac
On a Mac, find the program Terminal; it should be in the Utilities folder, inside your Applications folder. You'll probably want to drag it to your taskbar for easy access in the future. The first time you run Terminal, go to Preferences in the Terminal window and set "Declare terminal type ($TERM) as:" to "vt100". Then go to Window Settings in the Terminal menu, choose Keyboard from the pulldown menu at the top of the window, and check the box that says "Delete key sends backspace."
When you start up Terminal, you'll get a prompt that looks like this:
Your-Names-Computer:~ yourname$
After the dollar sign, type ssh userid@strauss.udel.edu, where userid is your UDelNet ID, and hit return. It will ask you for your password; enter it. You'll then be connected to Strauss, and you'll get this prompt:
strauss.udel.edu%
Getting connected to Strauss from Windows
On a Windows computer, see if the program SSH Secure Shell is on your computer, and if it isn't, download it from UDeploy. Start up the program, then click on "quick connect" and enter strauss.udel.edu for the host name and your UDelNet ID for the username. It will ask you for your password; if you enter it successfully, you'll get this prompt:
strauss.udel.edu%
Getting connected to Strauss from Linux
If you're running Linux, you're already enough of a geek that you don't need my help getting connected to the mainframe.
A little bit of Unix
The operating system on Strauss is Unix, so you've got to learn a few Unix commands. Unix was apparently written by people for whom typing is very painful, so most of the commands are a small number of cryptic letters. Case does matter; don't enter CD and think it means the same thing as cd. Here is all the Unix you need to know to run SAS. Commands are in bold and file and directory names, which you choose, are in italics.
| ls | Lists all of the file names in your current directory. |
| pico filename | pico is a text editor; you'll use it for writing SAS programs. Enter pico practice.sas to open an existing file named practice.sas, or create it if it doesn't exist. To exit pico, enter the control and X keys. You have to use the arrow keys, not the mouse, to move around the text once you're in a file. For this reason, I prefer to create and edit SAS programs in a text editor on my computer (TextEdit on a Mac, NotePad on Windows), then copy and paste them into a file I've created with pico. I then use pico for minor tweaking of the program. Note that there are other popular text editors, such as vi and emacs, and one of the defining characters of a serious computer geek is a strong opinion about the superiority of their favorite text editor and total loserness of all other text editors. To avoid becoming one of them, try not to get emotional about pico. |
| cat filename | Opens a file for viewing and printing. It will automatically take you to the end of the file, so you'll have to scroll up. To print, you may want to copy what you want to a word processor file for easier formatting. |
| mv oldname newname | Changes the name of a file from oldname to newname. When you run SAS on the file practice.sas, the output will be in a file called practice.lst. Before you make changes to practice.sas and run it again, you may want to change the name of practice.lst to something else, so it won't be overwritten. |
| cp oldname newname | Makes a copy of file oldname with the name newname. |
| rm filename | Deletes a file. |
| logout | Logs you out of Strauss. |
| mkdir directoryname | Creates a new directory. You don't need to do this, but if you end up creating a lot of files, it will help keep them organized into different directories. |
| cd directoryname | Changes from one directory to another. For example, if you have a directory named sasfiles in your home directory, enter cd sasfiles. To go from within a directory up to your home directory, just enter cd. |
| rmdir directoryname | Deletes a directory, if it doesn't have any files in it. If you want to delete a directory and the files in it, first go into the directory, delete all the files in it using rm, then delete the directory using rmdir. |
| sas filename | Runs SAS. Be sure to enter sas filename.sas. If you just enter sas and then hit return, you'll be in interactive SAS mode, which is scary; enter ;endsas; if that happens and you need to get out of it. |
Writing a SAS program
A SAS program consists of a series of commands. Each command is one or more words, followed by a semicolon. You can put comments into your program to remind you of what you're trying to do; these comments have a slash and asterisk on each side, like this:
/*This is a comment. It is not read by the SAS program.*/
The SAS program has two basic parts, the DATA step and the PROC step. (Note--I'll capitalize all SAS commands to make them stand out, but you don't have to when you write your programs.) The DATA step reads in data, either from another file or from within the program.
In a DATA step, you first say "DATA dataset;" where dataset is an arbitrary name you give the dataset. Then you say "INPUT variable1 variable2...;" giving an arbitrary name to each of the variables that is on a line in your data. So if you have a data set consisting of the length and width of mussels from two different species, you could start the program by writing:
data mussels; input species $ length width;
If you are putting the data directly in the program, the next step is a line that says "CARDS;", followed by the data, followed by a semicolon. Each observation is on a separate line, with the variables separated by one or more spaces:
data mussel; input species $ length width; cards; edulis 49.0 11.0 trossulus 51.2 9.1 trossulus 45.9 9.4 edulis 56.2 13.2 edulis 52.7 10.7 edulis 48.4 10.4 trossulus 47.6 9.5 trossulus 46.2 8.9 trossulus 37.2 7.1 ;
A variable name for a nominal variable (a name or character) has a space and a dollar sign ($) after it. If you want to treat a number as a nominal variable, such as an ID number, put a dollar sign after the name of the variable. Don't use spaces within variables; use Medulis or M_edulis, not M. edulis (there are ways of handling variables containing spaces, but they're complicated).
If you have a large data set, it may be more convenient to keep it in a separate file from your program. To read in data from another file, use the INFILE statement, with the name of the data file in single quotes. In this example, I use the FIRSTOBS option to tell SAS that the first observation is on line 2 of the data file, because line 1 has column headings that remind me what the variables are. You don't have to do this, but I find it's a good idea to have one or more lines of explanatory information at the start of a data file; otherwise, it's too easy to forget what all the numbers are.
data mussel; infile 'shells.dat' firstobs=2; input species $ length width;
The DATA statement can create new variables from mathematical operations on the original variables, like this:
data mussel; infile 'shells.dat' firstobs=2; input species $ length width; loglength=log10(length); shellratio=width/length;
The PROC step
Once you've entered in the data, it's time to analyze it. This is done with one or more PROC commands. For example, to calculate the mean and standard deviation of the lengths, widths, and log-transformed lengths, you would use PROC MEANS:
proc means data=mussel mean std; var length width loglength; run;
PROC MEANS tells SAS which procedure to run. It is followed by certain options. DATA=MUSSEL tells it which data set to analyze. MEAN and STD are options that tell PROC MEANS to calculate the mean and standard deviation. On the next line, VAR LENGTH WIDTH LOGLENGTH tells PROC MEANS which variables to analyze. RUN tells it to run.
Now put it all together and run a SAS program. Connect to Strauss and use pico to create a file named "test.sas". Copy and paste the following into the file:
data mussel; input species $ length width; loglength=log10(length); shellratio=width/length; cards; edulis 49.0 11.0 tross 51.2 9.1 tross 45.9 9.4 edulis 56.2 13.2 edulis 52.7 10.7 edulis 48.4 10.4 tross 47.6 9.5 tross 46.2 8.9 tross 37.2 7.1 ; proc means data=mussel mean std; var length width loglength; run;
Then exit pico (hit control-X). At the dollar sign prompt, enter sas test.sas. Then enter ls to list the file names; you should see new files named test.log and test.lst. First, enter cat test.log to look at the log file. This will tell you whether there are any errors in your SAS program. Then enter cat practice.lst to look at the output from your program. You should see something like this:
The SAS System
The MEANS Procedure
Variable Mean Std Dev
-----------------------------------------
length 48.2666667 5.2978769
width 9.9222222 1.6909892
loglength 1.6811625 0.0501703
-----------------------------------------
If you do, you've successfully run SAS. Yay!
PROC SORT and PROC PRINT
Specific statistical procedures are described on the web page for each test. Two that are of general use are PROC SORT and PROC PRINT. PROC SORT sorts the data by one or more variables. For many procedures, you need to sort the data first. PROC PRINT writes the data set to the output file. You can use it to make sure that your transformations, sorting, etc. have worked properly. You can sort the data by more than one variable; this example sorts the mussel data, first by species, then by length.
proc sort data=mussel; by species length; run; proc print data=mussel; run;
Adding PROC SORT and PROC PRINT to the SAS file produces the following output:
The SAS System
Obs species length width loglength shellratio
1 edulis 48.4 10.4 1.68485 0.21488
2 edulis 49.0 11.0 1.69020 0.22449
3 edulis 52.7 10.7 1.72181 0.20304
4 edulis 56.2 13.2 1.74974 0.23488
5 trossulus 37.2 7.1 1.57054 0.19086
6 trossulus 45.9 9.4 1.66181 0.20479
7 trossulus 46.2 8.9 1.66464 0.19264
8 trossulus 47.6 9.5 1.67761 0.19958
9 trossulus 51.2 9.1 1.70927 0.17773
As you can see, the data were sorted first by species, then within each species, they were sorted by length.
Graphs in SAS
It's possible to draw graphs with SAS, but I don't find it to be very easy. I recommend you take whatever numbers you need from SAS, put them into a spreadsheet, and use that to draw your graphs.
Getting data from a spreadsheet into SAS
I find it easiest to enter my data into a spreadsheet first, even if I'm going to analyze it using SAS. If you try to copy data directly from a spreadsheet into a SAS file, the numbers will be separated by tabs, which SAS will choke on. One way to fix this is to copy the data from the spreadsheet into a text editor (TextEdit on a Mac, Notepad on Windows), then do a search-and-replace to change all the tabs to spaces. You can then copy from the text editor and paste into the file you've opened on Strauss with Pico. Another way to get rid of the tabs is to use the Save As... command and save the spreadsheet as Space-delimited Text. After that, you open the file with TextEdit or Notepad, copy it, and paste it into your file on Strauss.
If you're going to keep your data in a separate file from the SAS program and read it using an INFILE statement, you can use the DELIMITER command to tell it that the values are separated by tabs. Here I've made a file named SHELLS.DAT using a spreadsheet, in which the values are separated by tabs (represented as '09'x in SAS):
data mussel; infile 'shells.dat' delimiter='09'x; input species $ length width;
If you have data separated by some other character, just put it in single quotation marks, such as DELIMITER='!' for data separated by exclamation marks.
More information about SAS
The user manuals for SAS are available online for free, which is nice. Unfortunately, they're in "frames" format, which makes it impossible to link to specific pages, so you won't see links to the appropriate topics in the manual in this handbook.
The UCLA Academic Technology Services has put together an excellent set of examples of how to do the most common statistical tests in SAS, SPSS or Stata; it's a good place to start if you're looking for more information about a particular test.
⇐ Previous topic |
This page was last revised August 13, 2008. Its address is http://udel.edu/~mcdonald/statsasintro.html.
©2007-2008 by John H. McDonald. You can probably do what you want with this content; see the permissions page at http://udel.edu/~mcdonald/statpermissions.html for details.