Two for one: Unique variable values


TS Contributor
I used the search and did not find the answer, and I've been poking in SAS documentation without much luck.

problem 1) I would like to tell SAS to randomly select x% of my spreadsheet subject to the constraint that all values for a particular variable are unique.
For example, if I have a sheet of 500 people with birthdays, I want a 5% random sample of these observations. However, I want zero birthdays repeated in the sample.

4286 Proc surveyselect data=birthday method=srs rate=.05 seed=12345
4287 out=uniqueBD;
4288 run;

This is the standard syntax, but I tried finding a way to put a constraint that birthday must be a unique value to end up in the sample. Google and SAS documentation have not given me what I'm looking for thus far.

Problem 2) I want to assign a unique, randomly generated ID to each observation subject to the constraint that some other feature is constant for a given ID.
For example, I want to assign a randomly generated, unique ID for each observation such that observations with the same address receive the same unique code, but such that different addresses have different codes.

I was thinking of how to go about this maybe:

Data step to create a new variable for the ID and using a random number function to specify an 8-digit number. I'm not strong enough in SAS to tell it to say for each address, give a unique value, but for matching addresses, assign the same value.

Any guidance with one or both is appreciated. The first problem is the most important.


TS Contributor
If the values in the original table are unique to start with can't you just sample without replacement?
I could, but they are not unique.

Take for example measuring a child's height over time. When they enter observation, they are assigned a random number for ID. Each measurement on the child is entered into a data rectangle with the ID to specify which person the measurements belong to in the study.

ID height date
1 50 x
2 65 x
3 70 x
1 55 (x+delta)
2 70 (x+delta)

where x is some start date and delta as a change between dates.

Random sampling on ID would possibly pull #1 twice, but I want it only once to sample one person, rather than one observation.