SAS versus R

noetsi

Fortran must die
#1
In study of the statistical packages, simulation from probability distributions is one of the important aspects. This paper is based on simulation study from Bernoulli distribution conducted by various popular statistical packages like R, SAS, Minitab, MS Excel and PASW. The accuracy of generated random data is tested through Chi-Square goodness of fit test. This simulation study based on 8685000 random numbers and 27000 tests of significance shows that ability to simulate random data from Bernoulli distribution is best in SAS and is closely followed by R Language, while Minitab showed the worst performance among compared packages.
This link might take you to the PDF (or might not - I had problems with it).

http://www.google.com/url?sa=t&rct=...wUtihaqHpkpofld4or-0oAg&bvm=bv.71198958,d.cWc

This is the lead author

Manash Pratim Kashyap
Department of Business Administration
Assam University, Silchar, India
kashayap.manashaus@gmail.com
 

Dason

Ambassador to the humans
#2
I only have my phone so I don't want to write a lot but the methodology on that paper seems pretty poor.
 

noetsi

Fortran must die
#3
lol what took you so long dason :p

I found this article, which I am not qualified to comment on, while studying simulation in the context of SAS. But I posted it knowing all the R users would show up :)
 

Dason

Ambassador to the humans
#4
I'll be on campus later and willing to comment more then. But I wanted to tell you that I love your new signature.
 

Dason

Ambassador to the humans
#6
For those that haven't read the paper this is their proposed method to evaluate the random number generators:
(i) Random samples are generated for different sizes (n=30, 50, 100, 250,
500 and 1000), for different values of the parameter p in range of
0.1(0.1)0.9 form the Bernoulli distribution using above mentioned
softwares.
(ii) The maximum likelihood estimates of the parameter are obtained for a
fixed sample size and fixed value of the parameter mentioned in step (i).
(iii) The procedure is replicated one hundred times.
(iv) Chi-Square goodness of fit test is conducted for a given sample size and a
given value of p for each package, and number of poor fits are recorded in hundred replications.
This isn't adequate to judge random number generation. It will give you an idea if your generator isn't doing it's job and you are getting observations that really don't follow the specified distribution but that's only one aspect of random number generation. Honestly I don't know why they choose the bernoulli/binomial distribution to evaluate. All random number generation done by a computer typically starts with a randomly generated uniform observation. Both R and SAS (as of SAS 9.0) default to using the Mersenne-Twister algorithm (at least if you use RAND in SAS - although it looks like they used RANBIN which uses a simpler linear congruential generator...).

Even ignoring that there has been *a lot* of research done on how to judge random number generators. This author seems to decide that they know better and use their own method to evaluate the programs' random number generation abilities. Now keep in mind that all they are doing is looking at goodness of fit. Here is a function that will get you 'random' binomial observations in R
Code:
myrbin <- function(N, n, p){
  ps <- dbinom(0:n, n, p)
  ex <- N*ps
  vals <- round(ex)
  sample(rep(0:n, vals))
}
Want to hear a couple awesome properties of this generator? It will pass the goodness of fit test *every time*. According to the authors criteria this is one hell of an awesome random number generator. Except that it's not. It's ****. Nobody would claim that this is a good generator. If I can come up with a horrible generator that looks like the "best" generator under the criteria given then I think we need to accept that the criteria aren't exactly meaningful. There are lots of things we need to consider - for instance we actually want the generator to appear random. In my case you might think that it looks alright at first glance
Code:
> myrbin(100, 5, .2)
  [1] 0 1 1 3 1 1 0 1 0 0 2 1 3 2 1 0 0 0 2 0 0 1 1 0 0 2 0 1 2 0 1 0 0 1 2 1 2
 [38] 1 0 1 2 0 1 1 0 1 1 1 0 1 3 1 1 1 1 0 3 1 2 0 2 2 0 0 2 1 1 0 2 2 1 1 1 3
 [75] 1 0 1 1 1 4 0 2 2 0 2 1 0 1 0 1 2 2 1 0 0 0 1 0 2 1
> myrbin(100, 5, .2)
  [1] 1 0 2 1 0 0 1 2 0 2 1 1 0 1 1 3 1 2 0 0 2 2 1 0 1 1 4 0 0 0 3 0 0 0 2 2 0
 [38] 1 1 1 0 1 3 1 0 1 0 0 0 3 2 2 0 1 1 1 1 1 2 1 2 1 0 0 1 1 1 2 1 2 2 0 0 1
 [75] 1 0 2 1 1 0 0 1 0 1 0 1 1 2 1 2 1 1 1 2 1 0 2 0 0 3
but take a look at it from this perspective:
Code:
> table(myrbin(100, 5, .2))

 0  1  2  3  4 
33 41 20  5  1 
> table(myrbin(100, 5, .2))

 0  1  2  3  4 
33 41 20  5  1 
> table(myrbin(100, 5, .2))

 0  1  2  3  4 
33 41 20  5  1 
> table(myrbin(100, 5, .2))

 0  1  2  3  4 
33 41 20  5  1
Doesn't seem so random now eh?

The criteria also doesn't account for other things like autocorrelation or patterns in the generation process.


It can be clearly seen from Table 5.2 that overall SAS is performing best as out of
5400 replication only 210 poor fits are observed at 5% level of significance, and
is followed by R Language with 212 poor fits out of 5400 tests.
They go on and act like SAS clearly won here. With 5400 replicates you would think that they would at least point out that it's essentially 'tied' according to their criteria when SAS and R only differ by 2.
Code:
> x <- c(210, 212)
> n <- c(5400, 5400)
> prop.test(x, n)

	2-sample test for equality of proportions with continuity correction

data:  x out of n
X-squared = 0.0025, df = 1, p-value = 0.9604
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.007864511  0.007123770
sample estimates:
    prop 1     prop 2 
0.03888889 0.03925926
This is a simulation study about random numbers - you would think they would at least account for the variation in the simulation itself.

Another thing to keep in mind is that lower isn't necessarily better here. If we're doing a goodness of fit test and the null really is true then we would hope to fail the test alpha percent of the time. Both of them end up with rejection rates of about 3.9%. We want this to be 5%. Since we're looking at discrete data I'm guessing that the asymptotics hadn't fully kicked in yet so the test statistic wasn't truly chi-square which is why we see 3.9% instead of 5%. It could also partially be an artifact of the variation in the simulation.

Honestly I think the main take away is that neither SAS nor R fail these ridiculous tests that were administered 5 years ago.
 

noetsi

Fortran must die
#13
As I have often noted I am not a statisician (not smart enough especially in math but to some extent coding as well). But I still use it as an analyst. Sort of like a medic compared to a doctor :) Thankfully, for me, the organizations I have worked with aren't looking for very advanced statistics (and in fact they refuse to use it so you have to reduce the complexity to get them to be interested).