Doing More with SAS

Today's topics

1)  Data manipulation
2)  Basic inferential statistics
3)  Intermediate inferential statistics
4)  SAS Graph procedures
5)  Permanent SAS files

 

HSB Codebook

ID

ID NUMBER

    1-3

SEX

 

      9

RACE

 

    17

SES

SOCIO-ECONOMIC STATUS

    25

SCTYP

SCHOOL TYPE

    33

HSP

HIGH SCHOOL PROGRAM

    41

LOCUS

LOCUS OF CONTROL

    49-53

CONCPT

SELF CONCEPT

    57-61

MOT 

MOTIVATION

    65-68

CAR

CAREER CHOICE

    73-74

RDG

READING SCORE

    81-84

WRTG

WRITING SCORE

    89-92

MATH

MATH SCORE

    97-100

SCI

SCIENCE SCORE

  105-108

CIV

CIVICS SCORE

  113-116

 





Data Manipulation

Combinations of subsetting, merging, dropping variables, and outputting from procedures allow the user to create the desired data set, which can then be saved permanently to a new data file.

Subsets

Subsets of cases
   Done within a data step with a "set" command
   Use a combination of conditionals, Boolean operators, and either "output" or "delete"

Subsets of variables
   Done within a data step with a "set" command
   "Keep" the variables you want

 

Merging

Concatenating data sets
   Use the "set" command in a data step, naming the data sets to be merged

Matched merging
   Sort the data sets by the variable or variables you want to match by
   "Merge" the data sets in a data step by the desired variables
   Beware of duplicates

 

Summary data sets

Some procedures (e.g., proc means) allow you to create an output data set of summary statistics;
you basically just have to name the data set and desired variables on an output subcommand of the procedure.  Can be done by subgroups using Proc sort and the "by" subcommand

 

Creating permanent text files

Associate a filename with the output file
Use a data step to associate a data set with filename
Use a "put" statement to place the variables in the desired locations within a record

 


 Basic Inferential Statistics

Proc corr  analyze relationships between variables
Pearson vs. Spearman correlations
Significance: 1 or 2 tailed p-values
listwise or pairwise deletion

 

Proc freq  analyze differences in proportions

Statistics
    Chi-squared
    phi and Cramer's V

Cells
   Observed
   Expected
   Row, column, total percents
   Residuals: cellchi2

 

Proc means  analyze differences in means
Paired-samples t-test  (dependent t-test) )
   Two variables
   Confidence interval

 

Proc ttest  analyze differences in means
Independent samples t-test
   2-groups
   Define groups

 


 Intermediate Inferential Statistics

Analysis of Variance (ANOVA) analyze differences among means


One-way factorial ANOVA: Proc GLM
Differences between group means (like an extension of independent t-test)
Compare means: One-way ANOVA
Post hoc tests show pairwise comparisons while controlling for the number of tests being conducted

Repeated measures ANOVA : Proc GLM
Differences among means for variables (like an extension of dependent t-test)
Often used for analyzing differences over time
Specify the factor name and number of levels
 

Regression  forming models for prediction from multiple predictor variables
Proc REG
Outcome variable is usually an interval variable
Predictor variables are usually interval variables, but can include dichotomous variables
Linear  regression forms a linear composite of the predictor variables in such a manner that the correlation between the composite of predictors and the outcome variable is maximized


 SAS/GRAPH procedures

Output of regular SAS graph and chart procedures produce low quality output that is actually text.  SAS/GRAPH procedures (e.g., Gchart, Gplot, Gcontour) produce higher quality, graphical output.

Depending on the type of graph produced, Goptions can be used to control aspects of the output, including: text, titles,  labelling, frames, fill patterns,  colors, axes, scaling, number of bars, etc.

Because of the number of available options for SAS/GRAPH procedures, creating nice graphs with SAS can be complicated and time-consuming, with a lot of trial and error involved.

Editing charts is done in a chart window, and there are limited editing options; one is better off setting up the graphs beforehand with Goptions.

From graph windows, graphs can be exported in a half-dozen formats, including gif, ps, and eps.  With some extra SAS programming, these options can be expanded.



A Sample Annotated SAS Program

  /* this is a comment, and can be placed almost anywhere in */
  /* your program (except the 1st 2 columns) */
  /* comments here are in boldface, but would not be bold*/
  /* in your program*/

filename in1 "~/public/minicourses/sas/hsb.dta";
filename out1 "~/public/minicourses/sas/hsb_2.dta";
options pagesize=55 linesize=72;

  /* the first filename command references an internal file "in1"*/
  /* to the external raw data file "hsb.dta" which lives in a*/
  /* subdirectory "public" in my Unix home directory*/
  /* the second filename references a new data set "hsb_2.dta"*/
  /* which will house a permanent data set created below*/


proc format;
   value sexfmt   1="MALE"
                          2="FEMALE";

   value racefmt  1="HISPANIC"
                          2="ASIAN"
                          3="BLACK"
                          4="WHITE";

   value sesfmt   1="LOWER"
                          2="MIDDLE"
                          3="UPPER";

   value schfmt   1="PUBLIC"
                          2="PRIVATE";

   value hspfmt   1="GEN"
                          2="ACAD"
                          3="VOC";

  /* proc format is used to create value labels to values*/
  /* the formats are assigned to variables in the data step*/

 

data one; infile in1 missover;
input
id sex race ses sctyp hsp locus concpt mot car rdg wrtg math sci civ;
format sex sexfmt. race racefmt. ses sesfmt. sctyp schfmt. hsp hspfmt.;

  /* the variables are read in free-field format*/
  /* value labels are assigned to particular variables*/
  /* with the format command*/

rw_dif=rdg-wrtg;
rm_dif=rdg-math;

  /* two new variables (difference scores) are created*/

 

*subsetting cases;

  /* this is another type of comment; the asterisk and semi-colon*/
  /* are necessary*/

 

data subset_1; set one;
   if sctyp=1 then output;

  /* a new data set is created which contains only cases whose*/
  /* school type is 1*/

proc freq data=subset_1;
   tables hsp;

 

data subset_2; set one;
   if sctyp=2 then delete;

  /* another new data set is created which contains only cases whose*/
  /* school type is not equal to 2*/

 

data subset_3; set one;
   if (hsp=1 or hsp=2) and sctyp=1 then output;

 

data subset_4; set one;
   if (hsp=1 or hsp=2) and sctyp=1 then delete;

  /* creating complicated subsets*/

 

*subsetting variables;

data subset_5; set subset_2;
   keep id sex race--hsp rdg--civ;

data subset_6; set subset_2;
   keep id locus--car;

  /* creating subsets where only certain variables are kept*/

 

*concatenating data sets;

data mset_7; set subset_3 subset_4;

  /* combining data sets "vertically"  (i.e., adding cases)*/

 

*match-merging data sets;

proc sort data=subset_5;
   by id;

proc sort data=subset_6;
   by id;

data mset_8;
   merge subset_5 subset_6;
   by id;

  /* combining data sets "horizontally"  (i.e., adding variables)*/
  /* matching by id; data sets must be sorted first*/

 

*getting a summary data set;

proc means data=one n mean std;
   var math;
   output out=mnset_9 mean=mean1;

proc print data=mnset_9;

  /* many procedures let you create an output data set*/
  /* this creates a data set with a single case, which has*/
  /* the mean math score for all original cases*


*getting a summary data set by groups;

proc sort data=one;
   by hsp sex;

proc means data=one n mean std noprint;
   var math;
   by hsp sex;
   output out=mnset_10 mean=mean2;

proc print data=mnset_10;

  /* this creates a data set with the mean math score*/
  /* for subgroups.  Cases must be sorted first*/

 

*sending a data set to a file;

data mnset_11; set mnset_10; file out1;
   put hsp 1  sex 3   @5 mean2 5.2;

  /* this creates a  permanent data set sent to the file "out1"*/
  /* defined at the beginning of the program*/
  /* the put statement defines what variables are written to*/
  /* what columns and in what format*/

 

*getting correlations;

proc corr data=one pearson nosimple;
   var rdg--civ;

proc corr data=one spearman nosimple;
   var rdg--civ;

  /* inferential statistics: correlations*/

 

*doing chi-squared analyses;

proc freq data=one;
   tables race*ses/ chisq expected nopercent cellchi2;

  /* a chi-squared analysis*/

*doing t-tests;

proc means data=one mean t prt;
   var rw_dif rm_dif;

  /* paired (i.e., dependent) t-tests*/
 

proc ttest data=one;
   class sex;
   var rdg math;

  /* independent t-tests*/

 

*doing one-way ANOVA's;

proc glm data=one;
   class ses;
   model math=ses;
   means ses/bon lines;

proc glm data=one;
   class race;
   model locus=race;
   means race/bon lines;

  /* one-way analyses of variance*/

proc glm data=one;
   class ;
   model rdg wrtg math sci=/nouni;
   repeated measure 4 contrast(1)/summary nom;

  /* a repeated measures one-way analysis of variance*/

 

*doing multiple regression;

proc reg data=one;
   model wrtg=sex locus concpt mot rdg/ stb;

  /* linear regression*/

 

  /* what follows are examples of graphs that can be*/
  /* created using sasgraph procedures*/
  /* options, patterns, symbols, colors, axes, etc.*/
  /* are all defined and controlled*/

goptions reset=all hsize=6 in vsize=6 in htitle=5 pct border gwait=5;

pattern1 value=solid color=blue;
pattern2 value=solid color=red;
pattern3 value=solid color=green;

axis1 label=('HIGH SCHOOL PROGRAM');
axis2 label=('PROGRAM');
axis3 label=('SCIENCE SCORES');

symbol1 color=blue value='';
symbol2 color=red value='*';

proc gchart data=one;
   vbar hsp/
   type=freq discrete
   caxis=black coutline=black frame freq clipref width=8
   ref=0 to 300 by 100 minor=1
   patternid=midpoint maxis=axis1 raxis=0 to 350 by 50;

proc gchart data=one;
   vbar hsp/
   type=mean sumvar=rdg discrete
   caxis=black coutline=black frame clipref mean width=8
   ref=30 to 50 by 10 minor=1
   patternid=midpoint maxis=axis1 raxis=30 to 60 by 10;

proc gchart data=one;
   vbar hsp/
   type=pct discrete group=sex g100
   caxis=black coutline=black frame clipref pct width=8
   ref=0 to 50 by 10 minor=1
   patternid=midpoint maxis=axis2 raxis=0 to 60 by 10;

proc gchart data=one;
   vbar sci/
   type=freq
   caxis=black coutline=black frame clipref
   ref=0 to 100 by 10 minor=1 SPACE=0.0
   maxis=axis3 raxis=0 to 110 by 10;

proc gplot data=one;
   plot wrtg*rdg=sex ;

run;

quit;

  /* the quit command is necessary to kill*/
  /* the sasgraph processes*/
 
 


Some Useful SAS Procedure


Base Procs

Proc

Description

   

Format

Defines value labels

Print

Prints variable values for each case

Sort

Sorts cases by one or more variables

Rank

Computes ranked values for a variable

Freq

Produces one-way or two-way frequency tables

Univariate

Computes descriptive statistics for a variable

Means

Computes descriptive statistics for a variable

Corr

Computes correlations among variables

Chart or Gchart

Produces bar charts, pie charts, etc.

Plot or Gplot

Produces scatterplots

 

Stat Procs

Proc

Description

   

Anova

Conducts an analysis of variance

Cancorr

Conducts a canonical correlation analysis

Cluster

Conducts a cluster analysis

Discrim

Conducts a discriminant analysis

Factor

Conducts a factor analysis

Freq

Conducts a chi-squared analysis

GLM

Used for complicated linear analyses (General Linear Model)

Princomp

Conducts a principal components analysis

Reg

Conducts a linear regression analysis

Score

Uses output of other stat procs to compute composite scores

Tree

Produces a dendogram for a cluster analysis

Ttest

Conducts an independent or dependent (paired) t-test





Some useful, basic Unix commands

Remember, Unix is case sensitive. For illustration purposes only, file and directory names are written in italics below.

Command

Description

   

ls

list files and subdirectories in the current directory

cd /name

navigate to the specified directory

cd name

navigate to the specified subdirectory

cd ..

navigate to the directory one level above the current directory

cd

Navigate to your home directory

ps

lists the current processes you own and their process id numbers (pid)

kill -9 #####

kills the process with the specified process id

top

lists the big jobs currently running; q quits

man command

lists the manual for usage of the specified command

mkdir name

creates a directory with the specified name

rmdir name

removes the specified directory (must be empty)

rm name

removes the specified file (permanently)

rm *.*

removes all files in the current directory (be careful!)

cp name1 name2

makes a copy of first file and names it the second file

df -k

lists some system stats




Other useful tips

Control-z will usually halt a job or process that has gone bad or is not responding. Doing a ps and then killing the process can be used to eliminate that process.

As SAS is running, it creates a work directory for your job to hold temporary files, usually in a "scratch" directory. If the SAS session ends cleanly (as it usually does), the work directory closes down. If, however, a SAS session dies a bad death, it can leave behind work files that can possibly interfere with future SAS sessions. If that happens, you may be asked to clean up what was left behind.

If you intend to run big jobs (i.e., you're working with huge data files) you might want to check with Research Computing for advice before you start. Certain aspects of a SAS session can be customized that may help you run efficiently.

Please don't run more than one SAS session at a time; it may interfere with other users.


Permanent SAS Data Sets

 Creating a library

1) Create a directory (i.e., folder or library) where you want to store the data files

    Unix: mkdir whatever (e.g., mkdir rick_sas_perm)

    N.B. A directory named sasuser.v91 is already built for you

2) In the Explorer Window, go to File/New/Library

3) Name the library (e.g., rick_sas) and enable it at startup

4) Browse to or write the path (e.g., /afs/northstar.dartmouth.edu/ufac/rbarton/rick_sas_perm)

 

Saving a permanent SAS data set

Method 1

1) Read your raw (i.e., text) data into SAS using the filename/infile/data commands

Example:

______________________________________________________________________________

filename in1 "~/public/minicourses/sas/hsb.dta";

data one; infile in1 missover;
input
id sex race ses sctyp hsp locus concpt mot car rdg wrtg math sci civ;

run;

______________________________________________________________________________

 

2) From the Tools menu, open up the Table editor

3) From the Viewtable File menu, Open the data set’s name from the Work library

4) From the Viewtable File menu, Save as the filename you wish to the library you wish

(e.g., Save as rick1 to the library rick_sas)

5) Close out the Viewtable window



Method 2

1) Assign a library name using a LIBNAME statement in a SAS program

2) In your data step that reads the raw data, use a two-level libref.filename to name the data set

Example:

______________________________________________________________________________

filename in1 "~/public/minicourses/sas/hsb.dta";
libname rick_sas "~/rick_sas_perm";

data rick_sas.rick2; infile in1 missover;
input
id sex race ses sctyp hsp locus concpt mot car rdg wrtg math sci civ;

run;

_____________________________________________________________________________


Using SAS data sets

1) Identify the library using the LIBNAME statement

2) In data steps and procedures, use the two-level libref.filename to refer to the data set

Example:

_____________________________________________________________________________

libname rick_sas "~/rick_sas_perm";

proc print data=rick_sas.rick2;

run;

______________________________________________________________________________

 

A note on SAS data sets

SAS data sets are somewhat unique among statistical programs in that formats (such as value labels) are saved in a file separate from the data. This can cause difficulties when trying to move SAS data into other programs.


 SAS Help at Dartmouth College

Help SAS has help facilities that are pretty good, with sample programs

Tutorial There is a tutorial under the help menu

Manuals The library has a number of SAS manuals

Books If you plan on working with SAS a lot there are "how to" books out there

Other users Sometimes they know more about SAS than I

SAS web site  has FAQ's and support for the extremely technical questions

Research Computing web site  ~rc


Richard Barton
Berry 179C
646-0255

richard.barton@dartmouth.edu