But, in my opinion, two parameters are estimated: the mean is estimated by the sample mean and the variance is estimated by the sample variance (corrected). Then why shouldn't we use the t-distribution with n-2 degrees of freedom?

- Thread starter Rafaelle55
- Start date

But, in my opinion, two parameters are estimated: the mean is estimated by the sample mean and the variance is estimated by the sample variance (corrected). Then why shouldn't we use the t-distribution with n-2 degrees of freedom?

Go ahead and work it out - try and estimate the variance without first having some estimate of the mean (you won't be able to).

I will try to say in other words what Dason said.

The refection of the DF is Not for any parameter you estimate BUT because you estimate the Variance based on the estimation of the Mean, so there is a connection between the mean and the variance and here you loose the degree of freedom.

For example in z test you only estimate the mean, so you don't loose any degree of freedom.

My common sense says

if you have 5 values:

if you know the estimation of mean -

I found the following explanation not sure if correct... He wrote:

"Also, note in a hypothesis test involving a 1 sample t-distribution the only known is the mean mu. The standard deviation/variance is estimated from the sample size so is not known. If you do know the standard deviation of the population then you should be using a Z-test not a T-test."

https://www.quora.com/Why-is-the-de...mean-and-the-standard-variance-are-both-given

In short, the mean of a normal distribution does not rely on the variance and the variance does not rely on the mean, therefor when conducting inferences you only lose 1 degree of freedom. Also, the t-distribution is by definition has a mean of zero, therefor the only parameter unknown is the variance, again resulting in the 1 less degree of freedom.

If you still aren't convinced, write some R code and run a few million iterations. I do this sometimes, because I am skeptical at times too.

My common sense says **n-2** also I understand it is **not correct** as all know it is **n-1**...

if you have 5 values:**x1,x2,x3,x4,x5**

if you know the estimation of mean -**average(x) **and estimation of variance - **sample var(x) **and you know **x1 x2 **and** x3 **you can calculate the **x4** and **x5**

if you have 5 values:

if you know the estimation of mean -

I found the following explanation not sure if correct... He wrote:

"Also, note in a hypothesis test involving a 1 sample t-distribution the only known is the mean mu. The standard deviation/variance is estimated from the sample size so is not known. If you do know the standard deviation of the population then you should be using a Z-test not a T-test."

https://www.quora.com/Why-is-the-de...mean-and-the-standard-variance-are-both-given

"Also, note in a hypothesis test involving a 1 sample t-distribution the only known is the mean mu. The standard deviation/variance is estimated from the sample size so is not known. If you do know the standard deviation of the population then you should be using a Z-test not a T-test."

https://www.quora.com/Why-is-the-de...mean-and-the-standard-variance-are-both-given

In short, the mean of a normal distribution does not rely on the variance and the variance does not rely on the mean, therefor when conducting inferences you only lose 1 degree of freedom.

Sorry if I really don't understand

I think this is different; my case is when the standard deviation is estimated and therefore not known. Of course when the standard deviation is known, a Z-test should be used.

Think of the data points creating the hyper-plane of n dimension covering all possibilities of data. Then the mean and variance in this case can have all possibilities in a 1 dimensional vector in this case (each having their own). So the resulting hyper plane is of the dimension of possibilities minus the maximum dimension of parameters.

If you have a distribution like a gamma function where the two a parameters are dependent, then the possibilities of the parameters is a 2 dimensional plane. Then the resulting degrees of freedom as n - 2.

I do encourage you to write some code and see for yourself. Create a random sample of normal data with known mean and variance. Then conduct the interval for the mean using degrees freedom of n-1 and n-2. Repeat this a couple thousand times and compare the reliability of the intervals against the intervals confidence. Doing a 95% CI you will see that n-1 will capture the mean 95% of the time, but n-2 will capture it more than 95% of the time.

The degrees of freedom really are associated with the variance estimate. In the case of a z-test we *know* the variance so we don't need to estimate anything before we can get what we use for the variance. In the case of a t-test we need to estimate the mean before we can even estimate the variance since it is required for the variance calculation. That's why we lose one degree of freedom - we needed to estimate one parameter before we could estimate the variance. In a simple linear regression we need to estimate the intercept and the slope before we can get a prediction for each point which is required before we estimate the variance. So in that case we lose 2 degrees of freedom because we estimated 2 parameters before we could estimate the variance.

Think of the data points creating the hyper-plane of n dimension covering all possibilities of data. Then the mean and variance in this case can have all possibilities in a 1 dimensional vector in this case (each having their own). So the resulting hyper plane is of the dimension of possibilities minus the maximum dimension of parameters.

If you have a distribution like a gamma function where the two a parameters are dependent, then the possibilities of the parameters is a 2 dimensional plane. Then the resulting degrees of freedom as n - 2.

I do encourage you to write some code and see for yourself. Create a random sample of normal data with known mean and variance. Then conduct the interval for the mean using degrees freedom of n-1 and n-2. Repeat this a couple thousand times and compare the reliability of the intervals against the intervals confidence. Doing a 95% CI you will see that n-1 will capture the mean 95% of the time, but n-2 will capture it more than 95% of the time.

The degrees of freedom really are associated with the variance estimate. In the case of a z-test we *know* the variance so we don't need to estimate anything before we can get what we use for the variance. In the case of a t-test we need to estimate the mean before we can even estimate the variance since it is required for the variance calculation. That's why we lose one degree of freedom - we needed to estimate one parameter before we could estimate the variance. In a simple linear regression we need to estimate the intercept and the slope before we can get a prediction for each point which is required before we estimate the variance. So in that case we lose 2 degrees of freedom because we estimated 2 parameters before we could estimate the variance.

A quick question though: why are the degrees of freedom associated with the variance estimate? Is it linked to the definition of a "degree of freedom"?

Maybe you can use this example to explain me the definition of a degree of freedom (more accurately, where I misunderstand the definition of a degree of freedom):

if you have 5 values:

if you know the estimation of mean -