Degrees of freedom for Chi2 normality test

#1
I have found mixed information in regard to the number of degrees of freedom used to evaluate the chi2 statistics derived to check the normality of a given data set.

In one case I found that df = Nbins - 3 and in other cases I have found df = Nbins-1

The idea of this test is to divide the dataset into a number Nbins of bins and comapre the values: 1. pieces of data counted in each bin, with 2. expected number of pieces of data in each bin based on the samples standard deviation, mean and, assumed, normal distribution.

The rationale for using df = Nbins-1-2 = Nbins-3, is that there are two parameters (samsple standard deviation and mean) that are used to obtain the value of the observed chi2. Hence the -2.

Could anyone clarify the matter? Should I use Nbins-1 or Nbins-3?

Thanks a lot,
R
 

Dragan

Super Moderator
#2
You would use: the # of bins - 1 (sample size) for your degrees of freedom (df) if you knew the (hypothesized) parameters for the mean and standard deviation. An example would be IQ scores with known parameters of Mu=100 and Sigma=15.

Otherwise, if you are using the estimates of the parameters for the population mean and standard deviation it would be df= the # of bins - the # of parameter estimates - 1 (sample size).
 
#3
You would use: the # of bins - 1 (sample size) for your degrees of freedom (df) if you knew the (hypothesized) parameters for the mean and standard deviation. An example would be IQ scores with known parameters of Mu=100 and Sigma=15.

Otherwise, if you are using the estimates of the parameters for the population mean and standard deviation it would be df= the # of bins - the # of parameter estimates - 1 (sample size).
Thank you Dragan, but so the bottom line is that in a normality test I need to use N-3.
In fact, if I don't even know if the data are normally distributed, there is no way I can know for sure the population mean and std.

Thanks a lot for replying!
 
#4
I already replied, but in fact I had not fully understood. Allow me to summarize to make sure we are on the same page:

1. I am trying to estimate the normality of the sample, not of the population
2. There are two cases, the sample may come from a (A) known population (of which mu and sigma are known) or (B) from an unknown one.
3. In the (A) case df=N-1, while in the (B) case we are already estimating two parameters using the sample estimates and therefore we use df=N-1-2=N-3

Thanks a lot Dragan!