Contributions to writing code that enable R to analyze the "Descriptive" statistics

#1
Apparently R does use only the raw data. In some other statistical programs, there is the option to calculate some simple tests, only by inputting the descriptive statistics.

I for one think it would be much better if we could enable R to run such tests too. There are not many tests that can be done with having only the descriptive statistics. Maybe unpaired t-tests and one-way and two-way ANOVAs, chi-square, or tests I am not aware of their ability to be computed without having the raw data.

But that still can be quite beneficial when we are dealing with another study where we don't have the raw data but need to calculate the P value, or need to assess the correctness of the tests used (as a reviewer) in that study. And we don't want to calculate it manually.

Thus I think lacking them in a program which is considered the number one program in statistics is a little bit unjustified.

So I hope I can add the code for the t.test (with only means, SDs, and sample sizes as the input arguments) and perhaps ANOVA and chi-square. But as a super ultra noob in R and also in stats, I think I will need your kind help.
 

bryangoodrich

Probably A Mammal
#2
Re: Contributions to writing code that enable R to analyze the "Descriptive" statisti

If I understand you correctly, aren't there simple algorithms for using these given values (mean, SD, n, etc.) to evaluate certain hypothesis tests? The problem with trying to make this generally (generic) as an R function should be is that these sort of aggregate values to run tests are case-specific. It shouldn't be hard to make wrappers to run those algorithms, though. It's just math, and in that sense you're just using R as a calculator and writing the sequence of calculations as a specific user-defined function to achieve your goal.
 
#3
Re: Contributions to writing code that enable R to analyze the "Descriptive" statisti

Ok, if we suppose that the user enters correct data, and the primitive function does not yet need to test for availability and correctness of all the necessary arguments, and if we limit ourselves (just for now) to calculate two-tailed P value with 95% confidence limit blah blah, it might look like this:

Code:
function t.testDescriptives(mean1, SD1, SS1, mean2, SD2, SS2)

t <- (mean1-mean2)/sqrt((SD1^2/SS1)+(SD2^2/SS2))
Pval <- 2*pt(-abs(t),df=(SS1+SS2)-1)
results <- list(Pval, t)
return results
If it was correct and working, I should proceed with adding some if clauses so that it tests for the correctness of data... (I can copy most of them from the original t.test)
 
#4
Re: Contributions to writing code that enable R to analyze the "Descriptive" statisti

If I understand you correctly, aren't there simple algorithms for using these given values (mean, SD, n, etc.) to evaluate certain hypothesis tests? The problem with trying to make this generally (generic) as an R function should be is that these sort of aggregate values to run tests are case-specific. It shouldn't be hard to make wrappers to run those algorithms, though. It's just math, and in that sense you're just using R as a calculator and writing the sequence of calculations as a specific user-defined function to achieve your goal.
Thanks :)

I don't know if such algorithms exist in R or not but think the answer is a no. I agree that it is not a problem to use other ways to do these calculations :) for example there are several programs which offer this option. I also don't mind if R is used only for more advanced purposes, and does not have more basic things on purpose or unintentionally.

I just wanted to start writing in R, and that was the idea coming to my mind in order to get things a little bit more colorful (by starting to possibly write a function). So it is not a major issue, while still think availability of such functions in R can attract rookies like me to get used to it :)
 

Dason

Ambassador to the humans
#5
Re: Contributions to writing code that enable R to analyze the "Descriptive" statisti

The degrees of freedom you're using would work if you use the pooled estimate of the variance (and assume equal variances) but you appear to be not assuming equal variance so you should use the satterthwaite approximation to the degrees of freedom: http://en.wikipedia.org/wiki/Welch's_t_test