SAS Permutation Coding Problem

#1
Hi,

Noob here. I need an elegant solution to the following problem in SAS:

The source dataset A contains N observations (= rows) of a single variable X, i.e. x1, x2, ..., xN. From it, I need to generate an output dataset B with one column (variable Y) and N*(N-1)/2 rows containing all possible permutations yi = xj*xk where j < k and k <= N.

Here's an example with N = 5:
Dataset A [nobs = N = 5]:
3
5
7
11
13​

Dataset B [nobs = N*(N-1)/2 = 5*4/2 = 10]:
15 (= 3*5)
21 (= 3*7)
33 (= 3*11)
39 (= 3*13)
35 (= 5*7)
55 (= 5*11)
65 (= 5*13)
77 (= 7*11)
91 (= 7*13)
143 (= 11*13)​
What is the most efficient way of programming this scheme for the general case in SAS? One option is to use macros but I haven't yet found a neat, working way to do it with, say, a single "DATA" step using the "FETCHOBS()" function. The problem seems to require being able to manipulate SAS's automatic variable "_N_" back and forth.

Thanks.
 
#2
Hope this helps

Code:
DATA A;
input X;
datalines4;
3
5
7
11
13
;;;;
RUN;

PROC SQL noprint;
create table results as
select (a.x* b.x) as AB_X
from a a left join a b on 1=1
where a.x<b.x
order by AB_X
;QUIT;

IF size really becomes a problem - I would look into hash tables
 
Last edited:
#4
Good.

I was thinking of a few other ways to do it, and maybe I can code some later, but here they are:
1) Hash Tables - Going to be the fastest
2) Use PROC SQL to load table A into a macro and use the scan function in a datastep
3) PROC SORT in descending order and retain previous values. This may be a little
tricky when you have repeating values

Anyways, glad that helped.
 
#5
Good.
2) Use PROC SQL to load table A into a macro and use the scan function in a datastep

Code:
DATA A;
input X;
datalines4;
3
5
7
11
13
;;;;
RUN;

PROC SQL noprint;
 select X into :X separated by '|'
 from A;
QUIT;

DATA A;
  set A;

  DO i=1 to countw("&x",'|');
  
    IF X< input(scan("&x",i,'|'),best12.) then do;  
        AB_X=X*input(scan("&x",i,'|'),best12.);
        output;
    END;
  END;

 drop i;
RUN;
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#6
Jrb599,

I am not familiar with the multiple ";" in your data step. What purpose do the four ; serve?
 
#7
Argh good catch. I updated the code to be datalines4 instead of datalines.

As you know the cards and datalines statement serve the same purpose and can be interchanged.

There's actually a 3rd statement that can be used which is datalines4. This indicates that all your observations will end when SAS sees 4 semicolons instead of just 1. It has to do with input that data that has a semicolon in it.

Using datalines4 is just habit for me. Let me know if it's not clear.
 
Last edited:
#10
For interest’s sake, here’s a native SAS version:

Code:
DATA B;
   KEEP y;
   dsid = OPEN("A", "IN");
   NumObs = ATTRN(dsid, "NOBS");
   iX = VARNUM(dsid, "x");
   DO i = 1 TO (NumObs-1);
      RetCode = FETCHOBS(dsid, i);
      x = GETVARN(dsid, iX);
      DO j = (i+1) TO NumObs;
         RetCode = FETCHOBS(dsid, j);
         y = x*GETVARN(dsid, iX);
         OUTPUT;
      END;
   END;
RUN;