I used the search and did not find the answer, and I've been poking in SAS documentation without much luck.
problem 1) I would like to tell SAS to randomly select x% of my spreadsheet subject to the constraint that all values for a particular variable are unique.
For example, if I have a sheet of 500 people with birthdays, I want a 5% random sample of these observations. However, I want zero birthdays repeated in the sample.
4286 Proc surveyselect data=birthday method=srs rate=.05 seed=12345
4287 out=uniqueBD;
4288 run;
This is the standard syntax, but I tried finding a way to put a constraint that birthday must be a unique value to end up in the sample. Google and SAS documentation have not given me what I'm looking for thus far.
Problem 2) I want to assign a unique, randomly generated ID to each observation subject to the constraint that some other feature is constant for a given ID.
For example, I want to assign a randomly generated, unique ID for each observation such that observations with the same address receive the same unique code, but such that different addresses have different codes.
I was thinking of how to go about this maybe:
Data step to create a new variable for the ID and using a random number function to specify an 8-digit number. I'm not strong enough in SAS to tell it to say for each address, give a unique value, but for matching addresses, assign the same value.
Any guidance with one or both is appreciated. The first problem is the most important.
problem 1) I would like to tell SAS to randomly select x% of my spreadsheet subject to the constraint that all values for a particular variable are unique.
For example, if I have a sheet of 500 people with birthdays, I want a 5% random sample of these observations. However, I want zero birthdays repeated in the sample.
4286 Proc surveyselect data=birthday method=srs rate=.05 seed=12345
4287 out=uniqueBD;
4288 run;
This is the standard syntax, but I tried finding a way to put a constraint that birthday must be a unique value to end up in the sample. Google and SAS documentation have not given me what I'm looking for thus far.
Problem 2) I want to assign a unique, randomly generated ID to each observation subject to the constraint that some other feature is constant for a given ID.
For example, I want to assign a randomly generated, unique ID for each observation such that observations with the same address receive the same unique code, but such that different addresses have different codes.
I was thinking of how to go about this maybe:
Data step to create a new variable for the ID and using a random number function to specify an 8-digit number. I'm not strong enough in SAS to tell it to say for each address, give a unique value, but for matching addresses, assign the same value.
Any guidance with one or both is appreciated. The first problem is the most important.