Share your SAS tips

#1
Since SAS isn't as reliant on UDFs as much as R is, I thought a thread on tips/useful gems hidden in SAS documentation/stuff you learned would be more appropriate than a thread on functions. So, share anything you think may be useful. Or, if you have general questions, post them here and someone else may have the answer. This can include everything in SAS, not just BASE or STAT.

So, I'll start with proc sql. Lots of times I'll have specific rows in mind that I want from a table, or I just want to look at the first few. But, SAS's implementation of SQL doesn't adhere to the ANSI standards and has their own proprietary stuff in there, so it took me a while to find out how to call specific rows.

The SAS command is MONOTONIC(). E.g., the syntax for selecting the first 100 rows from a table:

Code:
proc sql;
select x
from data
where monotonic() <= 100
; quit;
Never would've guess that one. For comparison, here is Oracle:

Code:
...
where rownum <=100
...
And T-SQL

Code:
select top 100 x
...
 
#2
Here is a situation I sometimes find myself in. In my dataset, I may need to examine a specific subset of closely related variables. Let's say that I have added a prefix of "bin_" to every binary variable and they are scattered throughout my dataset (i.e., not sequential variable numbers). I want to call all these variables at once, but since they are not sequentially ordered in the dataset I cannot call bin_var1 -- bin_var20. So, here is a method for grabbing all of those variables into a macro, assuming that the only variables that begin "bin_" are ones I want.

Code:
proc contents data=data out=data_contents (keep = name where=(upcase(substr(name,1,4)) = 'BIN_')) noprint; run;

proc sql noprint;
select name into:bin_vars separated by ' ' from data_contents;
quit;

%put &bin_vars.;
 
#5
Ha, thanks. I don't know that much about proc sql though. I will try to share some more BASE and STAT tips. Here is one similar to the above. Sometimes the variables you want have matching strings at the end. Say you've already created a contents dataset. To narrow it down to just those variables that end in "_bin":

Code:
if substr(reverse(trim(upcase(name))),1,4) = 'NIB_';
 

hlsmith

Not a robit
#6
I just came across "Using SAS to Analyze the Summary Data", NESUG 2006. I was familiar with these approaches, but had not seen the %SUM_ANOVA Macro for conducting overall test and pairwise comparisons without raw data. At times I am very inept with actually getting Macros to work, so I cut-n-pasted the actual functioning code for anyone's use, you just need to change out the N, Mean, and STD at the bottom of the code.

Well I guess the code was too long, I cannot post it. Though if you want to track it down the author was Zhenyi Xue.
 

noetsi

Fortran must die
#7
Strangely SAS (at least in EG) lets you limit some tasks easily (as when you run random numbers where it specifically asks how many rows you want) but not in queries.
 

noetsi

Fortran must die
#8
Here is a really simple tip which can be useful. It shows how to find duplicate rows. Here I am looking for duplicates in cusprofile_key but it can be in anything.


PROC SQL;
CREATE TABLE WORK.QUERY_FOR_DATA_0000 AS
SELECT t1.cusprofile_key,
/* COUNT_of_cusprofile_key */
(COUNT(t1.cusprofile_key)) AS COUNT_of_cusprofile_key
FROM WORK.DATA t1
GROUP BY t1.cusprofile_key
HAVING (CALCULATED COUNT_of_cusprofile_key) > 1;
QUIT;




If anyone finds it useful I can show you how to find if data in one table is in another, or not in another table.
 

noetsi

Fortran must die
#9
I have a question. I am preparing, for work primarily, a review of methods such as logistic regression, factor analysis, and so on in the context of SAS. If anyone is interested, and I can figure out how to post papers here, I can attach that.

This is not advanced statistics, as I am not an advanced statistician :p
 

Dason

Ambassador to the humans
#10
As a mod you should know how to attach things!

You need to go to the "Go advanced" way to reply to a post and then you'll see a paperclip on a paper as one of the buttons you can click - click this and it will bring up a dialog for attaching things.

Otherwise you could always upload the paper somewhere and just link to that instead.
 

noetsi

Fortran must die
#11
I am a mod imported just to fight the evil spam bots - I did not get a lot of training in other stuff :p

I don't really have anywhere to upload this to so I will attach it.
 

noetsi

Fortran must die
#13
I will paste the preliminary stuff (its only on logistic regression and factor analysis) this week. If anyone sees any errors in it, please let me know.
 
#14
I would be curious to see what your method is.
Here is a really simple tip which can be useful. It shows how to find duplicate rows. Here I am looking for duplicates in cusprofile_key but it can be in anything.


PROC SQL;
CREATE TABLE WORK.QUERY_FOR_DATA_0000 AS
SELECT t1.cusprofile_key,
/* COUNT_of_cusprofile_key */
(COUNT(t1.cusprofile_key)) AS COUNT_of_cusprofile_key
FROM WORK.DATA t1
GROUP BY t1.cusprofile_key
HAVING (CALCULATED COUNT_of_cusprofile_key) > 1;
QUIT;




If anyone finds it useful I can show you how to find if data in one table is in another, or not in another table.
 

noetsi

Fortran must die
#15
Being a SAS master is more useful than being a jedi master :p

To chose what is in t1 but not t2


PROC SQL;
CREATE TABLE SASUSER.p AS SELECT T1.SSN
FROM SASUSER.T2 AS T2
RIGHT JOIN SASUSER.T1 AS T1 ON (T2.SSN = T1.SSN)
WHERE T2.SSN IS NULL ;
QUIT;

To find what is in t2 but not t1


PROC SQL;
CREATE TABLE SASUSER.p2 AS SELECT T2.SSN
FROM SASUSER.T2 AS T2
LEFT JOIN SASUSER.T1 AS T1 ON (T2.SSN = T1.SSN)
WHERE T1.SSN IS NULL ;
QUIT;

T1 and T2 are two different tables. Obviously you can select other then one field. I normally use this only to select one field since the whole point is to find something in one but not the other table
 
Last edited:

noetsi

Fortran must die
#17
It is - a lot of people do

where VAR in (select VAR1 from TABLE1) which much slower than the join you did....I was just checking :)
I am not sure what you mean? Is there a faster way to do what I listed? Well I am sure there is:p

The truth is I "stole" the logic from ACCESS (one of the few things I liked in ACCESS). I converted one of their wizards to PROC SQL code.