Introduction to SAS

What is SAS?

SAS is a system of software for data access, management, analysis, and presentation available at Dartmouth on the research computing Unix machine Teton.  Built in modules, SAS can be used for (among other things) statistics, graphing, operations research, mapping, and matrix manipulation.  SAS is a powerful, but complicated, tool for data analysis that is basically command driven rather than menu driven.  This minicourse will focus on the basics of using SAS at Dartmouth for statistical analyses: running the program, working with data files, setting up simple SAS programs, and computing some descriptive statistics.
 

Running SAS at Dartmouth

You will need an account on Teton, generally available for faculty and grad students, limited access for undergrads.  You will also need an  X terminal emulator (such as ReflectionX) and secure connection software (such as SSH Secure Shell), for connecting to the Unix machines.



Some SAS windows

Program editor
Used for data entry and/or writing and executing programs with SAS commands.

Log
A journal that records what's happening with a submitted SAS program.  Useful for debugging.

Output
Shows the output of SAS procedures.
 

Some SAS menus

Help
SAS system, sample programs

View
Switches among various SAS windows

File
Open opens a text file (data or program) from your Unix account
Save saves to the last file SAS opened from your Unix account (be careful)
Save as saves to your Unix account
Exit quits SAS

Edit
Find finds text
Change finds and replaces text
Clear text clears the text from the program editor (file remains open)

Run
Submit executes the current SAS program
Recall text brings back the last program submitted



Using the program editor

The SAS program editor is a simple text (ASCII), line editor that's not so hot.  Don't like this editor?  Write your SAS program in another editor, save it as a text file, and open it in the SAS program window.

Copy line commands: C, CC
Delete line commands: D, DD
Insert line command: I
Move line commands: M, MM
 

Data files

SAS basically works with two types of data files.  Raw data files are simply text files where rows are cases and columns (or sets of columns) are variables.  SAS data files are raw data files that have been transformed by SAS system and are generally only usable by SAS.  The examples in these notes use data from a raw data file hsb.dta that has information on 600 high school seniors, with variables in the columns described in the codebook below.
 

HSB Codebook
 
ID ID NUMBER         1-3
SEX
           9
RACE
         17
SES SOCIO-ECONOMIC STATUS          25
SCTYP SCHOOL TYPE          33
HSP HIGH SCHOOL PROGRAM          41
LOCUS LOCUS OF CONTROL     49-53 
CONCPT SELF CONCEPT     57-61
MOT  MOTIVATION     65-68
CAR CAREER CHOICE     73-74
RDG READING SCORE     81-84
WRTG WRITING SCORE     89-92
MATH MATH SCORE   97-100
SCI SCIENCE SCORE 105-108
CIV CIVICS SCORE 113-116



Basic structure of a SAS program

Inputting raw data  This depends on the format of the data file.
Transforming data  Recoding data, computing new variables
Running procedures  Analyzing data



Working with output

Except for the output of SAS/GRAPH procedures, most SAS output is in a simple text format.  That means SAS programs, logs, and most output can be saved as text files, edited with a word processor, and printed from there.

SAS output can also be saved as a file, then opened in the SAS program editor.  Changes can be made there, and the results printed from the File menu.  Given that the SAS editor is quite primitive, this isn't always a great idea.

Output can also be printed directly from the output window's File menu.

SAS itself uses host printing.  In order to print from SAS and Teton, you will have to define a default printer.  This involves:

  1)       selecting Print from the File menu
  2)       selecting Setup from the Print menu
  3)       selecting New from the Printer Setup menu
  4)       giving the printer a name (e.g., biz)
  5)       selecting the type of printer from the list provided
  6)       specifying the Unix path for the printer (e.g.,  /bin/lp -dbiz) and route output there
  7)       then printing

This is of course tedious, and unfortunately SAS may not save these settings for you from one session to the next.  There's a lot of information you need to know and plenty of chances for error along the way.

So:  I'd recommend saving your output to a file and working from there.


SAS Help at Dartmouth College

Help SAS has help facilities that are pretty good, with sample programs

Tutorial There is a tutorial under the help menu

Manuals The library has a number of SAS manuals

Books If you plan on working with SAS a lot there are "how to" books out there

Other users Sometimes they know more about SAS than I

SAS web site  has FAQ's and support for the extremely technical questions

Research Computing web site  ~rc


 

Richard Barton
Berry 179C
646-0255


 A Sample Annotated SAS Program

  /* this is a comment, and can be placed almost anywhere in */
  /* your program (except the 1st 2 columns) */
  /* comments here are in boldface, but would not be bold*/
  /* in your program*/

filename in1 "public/hsb.dta";

  /* the filename command references an internal file "in1"*/
  /* to the external raw data file "hsb.dta" which lives in a*/
  /* subdirectory "public" in my Unix home directory*/

options pagesize=55;

  /* an options statement can be used to change the default*/
  /* SAS system options*/

data one; infile in1 missover;
input
id sex race ses sctyp hsp locus concpt mot car rdg wrtg math sci civ;

  /* a lot going on here, in a data step:*/
  /*                                                  */
  /* the data statement gives the name "one" to a SAS data set*/
  /* the infile statment associates the internal file "in1"*/
  /* from above with this SAS data set.  the missover option*/
  /* prevents SAS from wrapping around should there be some*/
  /* missing data in a record.*/
  /* the input statement tells SAS the order of the variables*/
  /* in the file in1.  With no columns or delimiters defined,*/
  /* the implied data format is free-field.*/

  /* an alternative input statement, using fixed format:*/
  /*                                                                          */
  /* input*/
  /* id 1-3  sex 9  race 17  ses 25  sctyp 33  hsp 41  locus 49-53*/
  /* concpt 57-61  mot 65-68  car 73-74  rdg 81-84  wrtg 89-92*/
  /* math 97-100  sci 105-108  civ 113-116;*/

rw=rdg+wrtg;

  /* computing a new variable rw as the sum of rdg and wrtg*/
  /* if either are missing values, so is rw*/

avtscore=mean(rdg,wrtg,math,sci,civ);

  /* computing a new variable using a SAS function*/
  /* it will have a missing value only if all of the arguments*/
  /* have missing values*/
  /* SAS has a zillion available functions*/

if race=4 then nrace=1;
else if (race>=1 and race<=3) then nrace=2;

  /* recoding race into a new variable nrace*/
  /* uses an if-then-else structure and numeric comparisons*/
  /* data transformations must take place while a data step is in*/
  /* effect. a data step is in effect until a "proc" or another*/
  /* data step is encountered.*/

proc print data=one;
     var id race nrace;

  /* the print procedure is used to print selected variable values*/
  /* for all cases. the data option selects the SAS data set to be */
  /* used (programs often use multiple data sets). the var*/
  /* subcommand says what variables to print. this is useful for*/
  /* checking data.*/

proc freq data=one;
     tables sex race sex*nrace;

  /* the freq procedure is used to get frequency tables for selected*/
  /* variables.  it can get one-way tables or two-way tables*/
  /* (crosstabulations).  one-way tables can be useful for checking*/
  /* for mistakes in your data.*/

proc univariate data=one;
     var rdg;

  /* the univariate procedure gets you a variety of summary statistics */
  /* for variables, including moments, quantiles, and extremes.  it*/
  /* can also be used to get frequency tables and some simple plots.*/

proc means data=one n mean std min max;
     var locus concpt mot;

  /* the means procedure gets you a variety of summary statistics*/
  /* in an efficient manner.  options let you specify the statistics*/
  /* you want.  getting mins and maxes can also be helpful for*/
  /* screening your data for mistakes.*/

proc sort data=one;
     by sex;

proc means data=one n mean std min max;
     var locus--mot;
     by sex;

  /* the sort procedure sorts cases by specified variables; one can*/
  /* sort in either ascending or descending order.  after sorting,*/
  /* other procedures can be used with the "by" subcommand to get you*/
  /* summary statistics for subgroups.*/
  /*                                                  */
  /* note the use of the double-dash as to get an inclusive list*/
  /* variables.*/

proc corr data=one;
     var rdg--civ;

  /* the corr procedure gets you a correlation matrix among variables.*/
  /* different types of correlations (e.g., Pearson, Spearman) and*/
  /* related measures (e.g., covariances, partial correlations,*/
  /* Cronbach's alpha) can be obtained with options*/

proc plot data=one;
     plot rdg*wrtg;

  /* the plot procedure gets you a scatterplot for two variables.*/
  /* output is in a "text" format.  a variety of options are*/
  /* available for formatting the output.*/

proc chart data=one;
     vbar ses/discrete;
     vbar ses/group=nrace discrete;

  /* the chart procedure gets you different types of graphs for*/
  /* displaying data, including vertical and horizontal bar charts,*/
  /* pie charts, and histograms.  a variety of options are available*/
  /* for customizing the graphs.  results are "text" format.*/
  /* if you want to get real (i.e., not text) plots and charts, you*/
  /* will want to use SAS/GRAPH procedures.  the results are much*/
  /* better quality, but the procedures have more options and the*/
  /* programming is more complicated.*/

run;

  /* the run command tells SAS to execute the previous commands.*/