#### Janus

##### New Member
Since SAS isn't as reliant on UDFs as much as R is, I thought a thread on tips/useful gems hidden in SAS documentation/stuff you learned would be more appropriate than a thread on functions. So, share anything you think may be useful. Or, if you have general questions, post them here and someone else may have the answer. This can include everything in SAS, not just BASE or STAT.

So, I'll start with proc sql. Lots of times I'll have specific rows in mind that I want from a table, or I just want to look at the first few. But, SAS's implementation of SQL doesn't adhere to the ANSI standards and has their own proprietary stuff in there, so it took me a while to find out how to call specific rows.

The SAS command is MONOTONIC(). E.g., the syntax for selecting the first 100 rows from a table:

Code:
proc sql;
select x
from data
where monotonic() <= 100
; quit;
Never would've guess that one. For comparison, here is Oracle:

Code:
...
where rownum <=100
...
And T-SQL

Code:
select top 100 x
...

#### Janus

##### New Member
Here is a situation I sometimes find myself in. In my dataset, I may need to examine a specific subset of closely related variables. Let's say that I have added a prefix of "bin_" to every binary variable and they are scattered throughout my dataset (i.e., not sequential variable numbers). I want to call all these variables at once, but since they are not sequentially ordered in the dataset I cannot call bin_var1 -- bin_var20. So, here is a method for grabbing all of those variables into a macro, assuming that the only variables that begin "bin_" are ones I want.

Code:
proc contents data=data out=data_contents (keep = name where=(upcase(substr(name,1,4)) = 'BIN_')) noprint; run;

proc sql noprint;
select name into:bin_vars separated by ' ' from data_contents;
quit;

%put &bin_vars.;

#### liufeng511

##### New Member
Thanks for your sharing! We are grateful for you!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
If Janus has a middle name it is definitely "proq sql"! If you have not, you should write a Proc sql book : )

#### Janus

##### New Member
Ha, thanks. I don't know that much about proc sql though. I will try to share some more BASE and STAT tips. Here is one similar to the above. Sometimes the variables you want have matching strings at the end. Say you've already created a contents dataset. To narrow it down to just those variables that end in "_bin":

Code:
if substr(reverse(trim(upcase(name))),1,4) = 'NIB_';

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I just came across "Using SAS to Analyze the Summary Data", NESUG 2006. I was familiar with these approaches, but had not seen the %SUM_ANOVA Macro for conducting overall test and pairwise comparisons without raw data. At times I am very inept with actually getting Macros to work, so I cut-n-pasted the actual functioning code for anyone's use, you just need to change out the N, Mean, and STD at the bottom of the code.

Well I guess the code was too long, I cannot post it. Though if you want to track it down the author was Zhenyi Xue.

#### noetsi

##### No cake for spunky
Strangely SAS (at least in EG) lets you limit some tasks easily (as when you run random numbers where it specifically asks how many rows you want) but not in queries.

#### noetsi

##### No cake for spunky
Here is a really simple tip which can be useful. It shows how to find duplicate rows. Here I am looking for duplicates in cusprofile_key but it can be in anything.

PROC SQL;
CREATE TABLE WORK.QUERY_FOR_DATA_0000 AS
SELECT t1.cusprofile_key,
/* COUNT_of_cusprofile_key */
(COUNT(t1.cusprofile_key)) AS COUNT_of_cusprofile_key
FROM WORK.DATA t1
GROUP BY t1.cusprofile_key
HAVING (CALCULATED COUNT_of_cusprofile_key) > 1;
QUIT;

If anyone finds it useful I can show you how to find if data in one table is in another, or not in another table.

#### noetsi

##### No cake for spunky
I have a question. I am preparing, for work primarily, a review of methods such as logistic regression, factor analysis, and so on in the context of SAS. If anyone is interested, and I can figure out how to post papers here, I can attach that.

This is not advanced statistics, as I am not an advanced statistician

#### Dason

As a mod you should know how to attach things!

You need to go to the "Go advanced" way to reply to a post and then you'll see a paperclip on a paper as one of the buttons you can click - click this and it will bring up a dialog for attaching things.

#### noetsi

##### No cake for spunky
I am a mod imported just to fight the evil spam bots - I did not get a lot of training in other stuff

I don't really have anywhere to upload this to so I will attach it.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Noetsi I would be happy to see the info.

#### noetsi

##### No cake for spunky
I will paste the preliminary stuff (its only on logistic regression and factor analysis) this week. If anyone sees any errors in it, please let me know.

#### Jrb599

##### New Member
I would be curious to see what your method is.
Here is a really simple tip which can be useful. It shows how to find duplicate rows. Here I am looking for duplicates in cusprofile_key but it can be in anything.

PROC SQL;
CREATE TABLE WORK.QUERY_FOR_DATA_0000 AS
SELECT t1.cusprofile_key,
/* COUNT_of_cusprofile_key */
(COUNT(t1.cusprofile_key)) AS COUNT_of_cusprofile_key
FROM WORK.DATA t1
GROUP BY t1.cusprofile_key
HAVING (CALCULATED COUNT_of_cusprofile_key) > 1;
QUIT;

If anyone finds it useful I can show you how to find if data in one table is in another, or not in another table.

#### noetsi

##### No cake for spunky
Being a SAS master is more useful than being a jedi master

To chose what is in t1 but not t2

PROC SQL;
CREATE TABLE SASUSER.p AS SELECT T1.SSN
FROM SASUSER.T2 AS T2
RIGHT JOIN SASUSER.T1 AS T1 ON (T2.SSN = T1.SSN)
WHERE T2.SSN IS NULL ;
QUIT;

To find what is in t2 but not t1

PROC SQL;
CREATE TABLE SASUSER.p2 AS SELECT T2.SSN
FROM SASUSER.T2 AS T2
LEFT JOIN SASUSER.T1 AS T1 ON (T2.SSN = T1.SSN)
WHERE T1.SSN IS NULL ;
QUIT;

T1 and T2 are two different tables. Obviously you can select other then one field. I normally use this only to select one field since the whole point is to find something in one but not the other table

Last edited:

#### Jrb599

##### New Member
It is - a lot of people do

where VAR in (select VAR1 from TABLE1) which much slower than the join you did....I was just checking

#### noetsi

##### No cake for spunky
It is - a lot of people do

where VAR in (select VAR1 from TABLE1) which much slower than the join you did....I was just checking
I am not sure what you mean? Is there a faster way to do what I listed? Well I am sure there is

The truth is I "stole" the logic from ACCESS (one of the few things I liked in ACCESS). I converted one of their wizards to PROC SQL code.

#### noetsi

##### No cake for spunky
This is a useful link to ways to test for various violations of model assumptions. Note that the point is to do it all in PROC REG. Thus there are simpler ways to do this, and better in the case say of autocorrelation, in other SAS PROC's notably the times series procs.

http://www2.sas.com/proceedings/forum2007/131-2007.pdf