What is SAS?
SAS is a system of software for data access, management, analysis, and
presentation available at Dartmouth on the research computing Unix machine
Teton. Built in modules, SAS can be used for (among other things) statistics,
graphing, operations research, mapping, and matrix manipulation. SAS
is a powerful, but complicated, tool for data analysis that is basically command
driven rather than menu driven. This minicourse will focus on the basics
of using SAS at Dartmouth for statistical analyses: running the program,
working with data files, setting up simple SAS programs, and computing some
descriptive statistics.
Running SAS at Dartmouth
You will need an account on Teton, generally available for faculty and grad students, limited access for undergrads. You will also need an X terminal emulator (such as ReflectionX) and secure connection software (such as SSH Secure Shell), for connecting to the Unix machines.
Program editor
Used for data entry and/or writing and executing programs with SAS commands.
Log
A journal that records what's happening with a submitted SAS program.
Useful for debugging.
Output
Shows the output of SAS procedures.
Some SAS menus
Help
SAS system, sample programs
View
Switches among various SAS windows
File
Open opens a text file (data or program) from your Unix account
Save saves to the last file SAS opened from your Unix account (be
careful)
Save as saves to your Unix account
Exit quits SAS
Edit
Find finds text
Change finds and replaces text
Clear text clears the text from the program editor (file remains
open)
Run
Submit executes the current SAS program
Recall text brings back the last program submitted
The SAS program editor is a simple text (ASCII), line editor that's not so hot. Don't like this editor? Write your SAS program in another editor, save it as a text file, and open it in the SAS program window.
Copy line commands: C, CC
Delete line commands: D, DD
Insert line command: I
Move line commands: M, MM
Data files
SAS basically works with two types of data files. Raw data files
are simply text files where rows are cases and columns (or sets of columns)
are variables. SAS data files are raw data files that have been transformed
by SAS system and are generally only usable by SAS. The examples in
these notes use data from a raw data file hsb.dta that has information on
600 high school seniors, with variables in the columns described in the codebook
below.
HSB Codebook
| ID | ID NUMBER | 1-3 |
| SEX | 9 | |
| RACE | 17 | |
| SES | SOCIO-ECONOMIC STATUS | 25 |
| SCTYP | SCHOOL TYPE | 33 |
| HSP | HIGH SCHOOL PROGRAM | 41 |
| LOCUS | LOCUS OF CONTROL | 49-53 |
| CONCPT | SELF CONCEPT | 57-61 |
| MOT | MOTIVATION | 65-68 |
| CAR | CAREER CHOICE | 73-74 |
| RDG | READING SCORE | 81-84 |
| WRTG | WRITING SCORE | 89-92 |
| MATH | MATH SCORE | 97-100 |
| SCI | SCIENCE SCORE | 105-108 |
| CIV | CIVICS SCORE | 113-116 |
Inputting raw data This depends on the format of the data
file.
Transforming data Recoding data, computing new variables
Running procedures Analyzing data
Except for the output of SAS/GRAPH procedures, most SAS output is in a simple text format. That means SAS programs, logs, and most output can be saved as text files, edited with a word processor, and printed from there.
SAS output can also be saved as a file, then opened in the SAS program editor. Changes can be made there, and the results printed from the File menu. Given that the SAS editor is quite primitive, this isn't always a great idea.
Output can also be printed directly from the output window's File menu.
SAS itself uses host printing. In order to print from SAS and Teton, you will have to define a default printer. This involves:
1) selecting Print from the
File menu
2) selecting Setup from the Print
menu
3) selecting New from the Printer
Setup menu
4) giving the printer a name
(e.g., biz)
5) selecting the type of printer
from the list provided
6) specifying the Unix path for
the printer (e.g., /bin/lp -dbiz) and route output there
7) then printing
This is of course tedious, and unfortunately SAS may not save these settings for you from one session to the next. There's a lot of information you need to know and plenty of chances for error along the way.
So: I'd recommend saving your output to a file and working from there.
Help SAS has help facilities that are pretty good, with sample programs
Tutorial There is a tutorial under the help menu
Manuals The library has a number of SAS manuals
Books If you plan on working with SAS a lot there are "how to" books out there
Other users Sometimes they know more about SAS than I
SAS web site has FAQ's and support for the extremely technical questions
Research Computing web site ~rc
Richard Barton
Berry 179C
646-0255
/* this is a comment, and can be placed almost anywhere in */
/* your program (except the 1st 2 columns) */
/* comments here are in boldface, but would not be bold*/
/* in your program*/
filename in1 "public/hsb.dta";
/* the filename command references an internal file "in1"*/
/* to the external raw data file "hsb.dta" which lives in a*/
/* subdirectory "public" in my Unix home directory*/
options pagesize=55;
/* an options statement can be used to change the default*/
/* SAS system options*/
data one; infile in1 missover;
input
id sex race ses sctyp hsp locus concpt mot car rdg wrtg math sci civ;
/* a lot going on here, in a data step:*/
/*
*/
/* the data statement gives the name "one" to a SAS data set*/
/* the infile statment associates the internal file "in1"*/
/* from above with this SAS data set. the missover option*/
/* prevents SAS from wrapping around should there be some*/
/* missing data in a record.*/
/* the input statement tells SAS the order of the variables*/
/* in the file in1. With no columns or delimiters defined,*/
/* the implied data format is free-field.*/
/* an alternative input statement, using fixed format:*/
/*
*/
/* input*/
/* id 1-3 sex 9 race 17 ses 25 sctyp 33
hsp 41 locus 49-53*/
/* concpt 57-61 mot 65-68 car 73-74 rdg 81-84
wrtg 89-92*/
/* math 97-100 sci 105-108 civ 113-116;*/
rw=rdg+wrtg;
/* computing a new variable rw as the sum of rdg and wrtg*/
/* if either are missing values, so is rw*/
avtscore=mean(rdg,wrtg,math,sci,civ);
/* computing a new variable using a SAS function*/
/* it will have a missing value only if all of the arguments*/
/* have missing values*/
/* SAS has a zillion available functions*/
if race=4 then nrace=1;
else if (race>=1 and race<=3) then nrace=2;
/* recoding race into a new variable nrace*/
/* uses an if-then-else structure and numeric comparisons*/
/* data transformations must take place while a data step is in*/
/* effect. a data step is in effect until a "proc" or another*/
/* data step is encountered.*/
proc print data=one;
var id race nrace;
/* the print procedure is used to print selected variable values*/
/* for all cases. the data option selects the SAS data set to
be */
/* used (programs often use multiple data sets). the var*/
/* subcommand says what variables to print. this is useful for*/
/* checking data.*/
proc freq data=one;
tables sex race sex*nrace;
/* the freq procedure is used to get frequency tables for selected*/
/* variables. it can get one-way tables or two-way tables*/
/* (crosstabulations). one-way tables can be useful for
checking*/
/* for mistakes in your data.*/
proc univariate data=one;
var rdg;
/* the univariate procedure gets you a variety of summary statistics
*/
/* for variables, including moments, quantiles, and extremes.
it*/
/* can also be used to get frequency tables and some simple plots.*/
proc means data=one n mean std min max;
var locus concpt mot;
/* the means procedure gets you a variety of summary statistics*/
/* in an efficient manner. options let you specify the statistics*/
/* you want. getting mins and maxes can also be helpful
for*/
/* screening your data for mistakes.*/
proc sort data=one;
by sex;
proc means data=one n mean std min max;
var locus--mot;
by sex;
/* the sort procedure sorts cases by specified variables; one
can*/
/* sort in either ascending or descending order. after sorting,*/
/* other procedures can be used with the "by" subcommand to get
you*/
/* summary statistics for subgroups.*/
/*
*/
/* note the use of the double-dash as to get an inclusive list*/
/* variables.*/
proc corr data=one;
var rdg--civ;
/* the corr procedure gets you a correlation matrix among variables.*/
/* different types of correlations (e.g., Pearson, Spearman) and*/
/* related measures (e.g., covariances, partial correlations,*/
/* Cronbach's alpha) can be obtained with options*/
proc plot data=one;
plot rdg*wrtg;
/* the plot procedure gets you a scatterplot for two variables.*/
/* output is in a "text" format. a variety of options are*/
/* available for formatting the output.*/
proc chart data=one;
vbar ses/discrete;
vbar ses/group=nrace discrete;
/* the chart procedure gets you different types of graphs for*/
/* displaying data, including vertical and horizontal bar charts,*/
/* pie charts, and histograms. a variety of options are
available*/
/* for customizing the graphs. results are "text" format.*/
/* if you want to get real (i.e., not text) plots and charts,
you*/
/* will want to use SAS/GRAPH procedures. the results are
much*/
/* better quality, but the procedures have more options and the*/
/* programming is more complicated.*/
run;
/* the run command tells SAS to execute the previous commands.*/