For those that haven't read the paper this is their proposed method to evaluate the random number generators:

(i) Random samples are generated for different sizes (n=30, 50, 100, 250,

500 and 1000), for different values of the parameter p in range of

0.1(0.1)0.9 form the Bernoulli distribution using above mentioned

softwares.

(ii) The maximum likelihood estimates of the parameter are obtained for a

fixed sample size and fixed value of the parameter mentioned in step (i).

(iii) The procedure is replicated one hundred times.

(iv) Chi-Square goodness of fit test is conducted for a given sample size and a

given value of p for each package, and number of poor fits are recorded in hundred replications.

This isn't adequate to judge random number generation. It will give you an idea if your generator isn't doing it's job and you are getting observations that really don't follow the specified distribution but that's only one aspect of random number generation. Honestly I don't know why they choose the bernoulli/binomial distribution to evaluate. All random number generation done by a computer typically starts with a randomly generated uniform observation. Both R and SAS (as of SAS 9.0) default to using the Mersenne-Twister algorithm (at least if you use RAND in SAS - although it looks like they used RANBIN which uses a simpler linear congruential generator...).

Even ignoring that there has been *a lot* of research done on how to judge random number generators. This author seems to decide that they know better and use their own method to evaluate the programs' random number generation abilities. Now keep in mind that all they are doing is looking at goodness of fit. Here is a function that will get you 'random' binomial observations in R

Code:

```
myrbin <- function(N, n, p){
ps <- dbinom(0:n, n, p)
ex <- N*ps
vals <- round(ex)
sample(rep(0:n, vals))
}
```

Want to hear a couple awesome properties of this generator? It will pass the goodness of fit test *every time*. According to the authors criteria this is one hell of an awesome random number generator. Except that it's not. It's ****. Nobody would claim that this is a good generator. If I can come up with a horrible generator that looks like the "best" generator under the criteria given then I think we need to accept that the criteria aren't exactly meaningful. There are lots of things we need to consider - for instance we actually want the generator to appear random. In my case you might think that it looks alright at first glance

Code:

```
> myrbin(100, 5, .2)
[1] 0 1 1 3 1 1 0 1 0 0 2 1 3 2 1 0 0 0 2 0 0 1 1 0 0 2 0 1 2 0 1 0 0 1 2 1 2
[38] 1 0 1 2 0 1 1 0 1 1 1 0 1 3 1 1 1 1 0 3 1 2 0 2 2 0 0 2 1 1 0 2 2 1 1 1 3
[75] 1 0 1 1 1 4 0 2 2 0 2 1 0 1 0 1 2 2 1 0 0 0 1 0 2 1
> myrbin(100, 5, .2)
[1] 1 0 2 1 0 0 1 2 0 2 1 1 0 1 1 3 1 2 0 0 2 2 1 0 1 1 4 0 0 0 3 0 0 0 2 2 0
[38] 1 1 1 0 1 3 1 0 1 0 0 0 3 2 2 0 1 1 1 1 1 2 1 2 1 0 0 1 1 1 2 1 2 2 0 0 1
[75] 1 0 2 1 1 0 0 1 0 1 0 1 1 2 1 2 1 1 1 2 1 0 2 0 0 3
```

but take a look at it from this perspective:

Code:

```
> table(myrbin(100, 5, .2))
0 1 2 3 4
33 41 20 5 1
> table(myrbin(100, 5, .2))
0 1 2 3 4
33 41 20 5 1
> table(myrbin(100, 5, .2))
0 1 2 3 4
33 41 20 5 1
> table(myrbin(100, 5, .2))
0 1 2 3 4
33 41 20 5 1
```

Doesn't seem so random now eh?

The criteria also doesn't account for other things like autocorrelation or patterns in the generation process.

It can be clearly seen from Table 5.2 that overall SAS is performing best as out of

5400 replication only 210 poor fits are observed at 5% level of significance, and

is followed by R Language with 212 poor fits out of 5400 tests.

They go on and act like SAS clearly won here. With 5400 replicates you would think that they would at least point out that it's essentially 'tied' according to their criteria when SAS and R only differ by 2.

Code:

```
> x <- c(210, 212)
> n <- c(5400, 5400)
> prop.test(x, n)
2-sample test for equality of proportions with continuity correction
data: x out of n
X-squared = 0.0025, df = 1, p-value = 0.9604
alternative hypothesis: two.sided
95 percent confidence interval:
-0.007864511 0.007123770
sample estimates:
prop 1 prop 2
0.03888889 0.03925926
```

This is a simulation study about random numbers - you would think they would at least account for the variation in the simulation itself.

Another thing to keep in mind is that lower isn't necessarily better here. If we're doing a goodness of fit test and the null really is true then we would hope to fail the test alpha percent of the time. Both of them end up with rejection rates of about 3.9%. We want this to be 5%. Since we're looking at discrete data I'm guessing that the asymptotics hadn't fully kicked in yet so the test statistic wasn't truly chi-square which is why we see 3.9% instead of 5%. It could also partially be an artifact of the variation in the simulation.

Honestly I think the main take away is that neither SAS nor R fail these ridiculous tests that were administered 5 years ago.