Calculation of x2-probability - formula

#1
Hey guys.

I'm currently working with Stata and a nominal data set. I've done a chi^2 test and found a P-value (0.023), but I would really like to understand how Stata goes from chi^2 result of 14.5391 and 5df to a P-value of 0.023. Could someone explain to me, how to calculate this? I see loads of chi^2 tables but not many formulas. Could someone maybe walk me through?

I know it's not required to show this formula most of the time, but I'm really curious.
 

TheEcologist

Global Moderator
#2
Hi Starpen,

The p-value calculation comes from the \( \chi^{2}\) distribution.

In essence it is the probability distribution of the sum of the squares of N independent random variables (like \(\Sigma(expected - observed) ^2\)). You can use this distribution (in its cumulative form) to calculate p-values. This may seem like allot of info if you have never worked with these before but why don't you read the wiki link and post back if you have more questions.
We will be glad to help.

A nifty little program that can help you get a feeling for distributions and corresponding p-values is PQRS. It shows you the pdf, cdf and formula calculations for a range of distributions along with graphical representations of what you are looking at.

Hope this helps,
 
#3
Guess I'm in way over my head. I get that Stata can go from 14.5391 to 0.023, but I have no idea what to do. I understand "easier" formulas (like chi^2 you posted) but I don't get the other ones. The cumulative formula, if I understand right, is like this:

http://upload.wikimedia.org/wikipedia/en/math/8/5/c/85c553f6bf4385bc372c21a01b1c1e9b.png

And I get that it is used to calculate if the random variable X takes on a value less than or equal to x, but I'm having trouble seeing where my numbers go in. Is this way over my head? I can calculate chi^2 by formula (in hand) but I fail to understand the second part, if I should do it myself.
 

BGM

TS Contributor
#4
Stata goes from chi^2 result of 14.5391 and 5df to a P-value of 0.023
But when I try to compute in R,

Code:
1-pchisq(14.5391,5)
[1] 0.01252437
so not quite match the numbers you posted if you have the 1-sided alternative in the chi-square test. Or I miss something here?

When you need to compute the probability with chi-square/normal/gamma, in general these probabilities have no closed form (except some special values), so you need to seek the help of a computer software to do the numerical computation for you, or search the table generated.
 
#5
Sorry. I missed a number. **** I'm slow :) Your number is right, I get 0.013 (rounded) in Stata, so I missed the "1-key" :)

So there isn't any formula I can calculate this from? I'm not very mathematical, but I just assumed, that since Stata can do it, I could as well. But that isn't the case?

Btw, thanks to everyone here! Even though I'm doing my masters, this is very new for me. I hope I'll be able to help one day as well.
 

TheEcologist

Global Moderator
#6
The p-value is the probability of observing a test statistic (your calculated chisquare) at least as extreme in the chi-squared distribution for a given degrees of freedom. The cumulative distribution function (CDF) gives the probability of having obtained a value less extreme than this point. To obtain the probability of having obtained a value at least as extreme you simply subtract the CDF value from 1, which gives the p-value (this is what BGM has done).

If you use the program PQRS you can see how the CDF and PDF work to give you a p value (select the distribution, input the df and enter the chi square value in the slider bar). If you want to calculate it by hand remember that the Γ(k/2) symbol in the CDF equation denotes the Gamma function. If you are not very confident with equations, then don´t worry knowing what the p value represents in the chi squared distribution is already a good level of understanding (beyond just looking it up in a table).
 
#7
I wish I could do more than just look it up and understand what it means. I really want to understand how to get there, but gamma function and all that is just.... way over my head. Guess I'll just settle with the fact that I don't need to know it :)

Thanks you so much for your help!
 

BGM

TS Contributor
#8
One last point to add:

... since Stata can do it, I could as well. But that isn't the case?
Yes you can do it but surely you do not want to do it :)

The CDF is related to the integral of a gamma function, which do not have a closed form solution. Software can do the calculations for you via the numerical methods, which basically involve a number iterations, and will be tedious for you to do by hand.
 
#9
Guess I don't know anything :) Thank you guys for the help. I'd still love to understand this better and I think I'll try to read more about gamma functions, but the road appears long, since the only thing I know about gamma is from my old Hulk comic books :)