How to guarantee a "full random" selection from a group?

#1
Hi all,

I need to create a full random selection of 5% from a group (>1000000 elements).
The only criteria is, that all elements should have the same probability to be selected.
The selected 5% will be compared against the full base later on, but there are no predefined attributes/segments for that.
I want to know if it can be measured how perfect is the selection, because I need to make some test cases. Is there any usual value for what is acceptable as "full random"?

Thanks for your help!
 

Mean Joe

TS Contributor
#2
Do you have characteristics of the group? eg avg age=55, 30% black, 45% male? You could check that your selection does not deviate much from those characteristics.
 

noetsi

Fortran must die
#3
Why don't you just take a random sample? That is the gold standard for having the same probability of selection.

If you know the population parameters (for example the percent of a specific race) then you can calculate the percent difference between your sample and the population. What would be considered unacceptable deviation I don't know. You could calculate how many standard deviations you are from the population means if the data is roughly normal.
 
#4
Hi Mean Joe, Hi Noetsi,

Thanks for your reply and help!

There are many attributes of the group that can be measured. My problem is that I don't know which of them (or others) will be used later on to analyze the frame vs. the sample. Analyzers can later tell me that the sample wasn't really random, so I want to guarantee somehow the goodness of the random selection, or give the probability or level of confidence or reliability of the selection.

Is it a good solution to simply choose eg. 5 different attributes and check if the selection has the same characteristics for these attributes as the frame has? Will the result be valid for other attributes as well?

Why I don't take a random sample? Well, I want to do it. The question is, how can I check if the selection method is really random? So my question is how can I measure with statistical methods/instruments if a certain technical solution for random sampling is really a good solution for random sampling.
 

Mean Joe

TS Contributor
#5
The question is, how can I check if the selection method is really random?
You could use a random number generator that has been through peer review, such as the Mersenne Twister. The FAQ details how to select a uniform random number in the integers [0..N-1].

Note: I have not had the chance to use this myself, and I am not trying to viral market a random number generator here through TalkStats. Just trying to pass along something that might be helpful!

Here's an article entitled "Pitfalls in Random Number Generation", so you can see explicitly stated some things to look for when you do your random selection.
 

noetsi

Fortran must die
#6
Given your comments about attributes above I am not sure I really understand what you need.

The question is, how can I check if the selection method is really random?
As mean joe noted above use a random number generator. SAS and I am sure all statistical software have one of these.