# t vs Z, distributions and testing

Status
Not open for further replies.

#### joeb33050

##### Member
My understanding of t and Z testing was that a t test is appropriate when n < about 30 and σ is unknown-that Z testing is inaccurate when n < about 30; and Z testing is appropriate when n > about 30 and σ is known. When n > 30, s is close to σ and can be corrected.

The t distribution and t testing REQUIRE that σ be estimated, if σ is known and used, the t distribution is inappropriate and incorrect. The t distribution includes an s correction.

The t distribution is appropriate when n < about 30 and σ is ESTIMATED. Or is it? If σ is known or can be estimated accurately, the Z distribution and testing work fine.

The t distribution is all about s inaccuracy, s becomes less correct as n decreases.

The Normal distribution and Z testing work fine if s is close to σ, if s is corrected when n < about 30.

A Monte Carlo estimate of σ with σ = 1 shows that, approximately:

and a graph:

and the graph table:

Last edited:

#### obh

##### Well-Known Member
Very simple if you know the standard deviation use Z distribution, if you estimate the standard deviation use the t-distribution.

When n>30 and you estimate the standard deviation please use the t-test, today there is no reason to use the z-test when we can you can use the more correct t-test (in the past there was a reason when people used tables and the z-test table was more detailed)

#### joeb33050

##### Member
My interest is in understanding WHAT statistics is, and not HOW to do it.

The t distributions are Normal distributions with s corrected to σ.

σ /s * s = σ

σ / s is a value that varies as n and is easily estimated.

Each t distribution uses σ / s appropriate to that n to correct s to σ.

Thus, t calculations must use s rather than σ, else calculations are incorrect.

Knowing σ is not required in Z testing, σ = s * σ / s.

A table of σ /s vs. n; and we could forget t forever.

t, s = (x̄ 1 - x̄ 2) / (s / √n)

Z, σ = (x̄ 1 - x̄ 2) / (σ / √n)

P (t ≤ t, s) = P (Z ≤ Z, σ)

#### Dason

I mean... That's wrong.

table of σ /s vs. n
Do you somehow think s is constant? The issue is that it's a random variable. It will change with every sample.

#### joeb33050

##### Member
For Normal distributions, each n, sigma/s is a constant. See the top table above. For n = 2, sigma/s is about 1.242. t distributions are Normal distributions with sample s multiplied by sigma/s, the constant. The v = 1, n =2, t distribution is a Normal distribution with the/each s multiplied by ~ 1.242. See the right 2 columns in the bottom table. If this isn't clear, I'll do an example.

#### joeb33050

##### Member
With random, Normal, n = 4, σ = 1 variables; the average of 40,000 values of s was .921. s bar/σ =

1 / .921 = 1.086.

Let x̄ 1 = 1.1, x̄ 2 = 1

Z = (x̄ 1 - x̄ 2) / (s / √n)

Z = .1 / (1 / 2) = .05

P (Z ≤ .05) = .520

t = (x̄ 1 - x̄ 2) / (s / √n), but multiply s by 1.086 to correct s

t = (.1) / ((s* 1.086) / √n)

t = .1 / (.921 * 1.086) / 2,

t = .1 / (1.2) = .05

P (t ≤ .05, v = 3, = .5184 ≈ .520

Estimating error.

#### Dason

How could you ever possibly believe that sigma/s is a constant when s is a random variable that changes from sample to sample.

Maybe it's just a case of you not explaining yourself correctly but being able to adequately articulate your reasoning is vital.

But I would suggest reading some introductory mathematical statistics books to brush up on the fundamentals. Any good ones cover everything you're attempting and you should be able to figure out where you're going wrong.

#### joeb33050

##### Member
I have failed again. Let me make it as simple as possible.

(1) If we estimate, using the popular "root mean squared" method, the standard deviation of samples of n random Normal variables with standard deviation of sigma; then large numbers of those sample standard deviations will approach an average we will call "s bar".

(2) For each n, sigma / s bar is a unique number > 1, decreasing as n increases. This why t tests are said to be more accurate than Z tests-not true when we know or estimate sigma correctly.

(3) The/each t distribution, v = n - 1, includes the calculation: "s * (sigma / s bar)", (s / s bar) * sigma, 1 * sigma; s then becomes sigma or an estimate of sigma.

(4) And, thus, P (t </= t test) = P (Z </+ Z test), when Z = (x1 - x2)/ (SIGMA / sqrt n).
Thus, a t distribution is a Normal distribution when s = sigma.

Let me know where you get stuck and I'll try to help.

#### Dason

What exactly is your math/stat background?

#### Dason

I'm not trying to be a jerk but you aren't very good at articulating your reasoning and frankly you make a lot of incorrect assertions. Add to that that in almost all of your threads it's very difficult to understand what your goal/motivation is. So yeah I'm just wondering what kind of background you have to give some better context. If you don't want to answer that's fine but I have no obligation to respond to this thread at all.

#### joeb33050

##### Member
If the reader will tell me the first point where this is incorrect or not understood; then I will explain or correct my error.

In both t and Z testing we estimate the probability that two Normal distributions, 1 and 2, have means such that µ 1 = µ 2.

Sample means, sample or true standard deviations, and sample n are used.

The inputs are these: s, σ, x̄ 1, x̄ 2, and n; a set of inputs.

t = (x̄ 1 - x̄ 2) / (s / √n)

Z = (x̄ 1 - x̄ 2) / (σ / √n)

For any set of inputs, P (t ≤ t test) = P (Z ≤ Z test)

For any set of inputs, any n, the probabilities, t or Z, are identical.

For ANY n, Z test results equal t test results.

In t testing, s MUST be used; in Z testing, σ, or an unbiased estimator of σ, MUST be used.

If, for any set of inputs, P (t ≤ t test) = P (Z ≤ Z test); then Z tests could replace t tests. Corrected s input to Z test makes Z probabilities = t probabilities.

Next:

s is an estimate of σ, the RMS estimate. s bar, the average of a lot of estimates of s, is < σ for all sample sizes, all values of n. At small values of n, σ / s increases a lot; at large values of n σ / s bar is quite small.

A Monte Carlo simulation of 400,000 estimates of s bar with σ = 1 shows that at n = 2, s bar ≈ .797; at n = 30, s bar ≈ .992.

σ / s bar is a constant for each and every n, and (σ / s bar) * s corrects s such that, on average, s corrected = σ.

Because s bar is always < σ, and t tests are as accurate as Z tests using σ; then Z tests using uncorrected s as estimate of σ will ALWAYS be less correct than t tests.

Because, for any set of inputs, P (t ≤ t test) = P (Z ≤ Z test); the s input to t calculations MUST be corrected within the t system by σ / s bar. Without that correction the probabilities are/cannot be equal.

#### obh

##### Well-Known Member
I'm not trying to be a jerk but you aren't very good at articulating your reasoning and frankly you make a lot of incorrect assertions. Add to that that in almost all of your threads it's very difficult to understand what your goal/motivation is. So yeah I'm just wondering what kind of background you have to give some better context. If you don't want to answer that's fine but I have no obligation to respond to this thread at all.
Not jerk, only great

#### Dason

If the reader will tell me the first point where this is incorrect or not understood; then I will explain or correct my error.

#### joeb33050

##### Member
As I suspected, s estimated from the range is unbiased; but the standard deviation of s range is greater than that of s RMS. I'm trying to figure out the cost/benefit.

#### joeb33050

##### Member
EXPLANATION: Z AND t TESTING

Both Z and t tests estimate the probability that samples from two Normal distributions, 1 and 2, have equal means; we then make inferences about whether the two distributions have equal means; that µ 1 = µ 2.

With both tests, test statistics, named, here, “Z test” and “t test”, are calculated.

t test = (x̄ 1 - x̄ 2) / (s / √n)

Z test = (x̄ 1 - x̄ 2) / (σ / √n)

The commonly stated instructions are:

Use the t test when n < about 30 and σ is unknown.

Use the Z test when n > about 30, σ is known, or n is large enough that s is an adequately accurate estimator of σ.

What follows is the WHY behind these instructions.

THE VARIANCE AND STANDARD DEVIATION are explained in that section of this book, above.

One estimator of σ, called “s”, the sample standard deviation, is described and explained. The formula for s is:

This formula is used to estimate σ, the population standard deviation. The s that we are talking about here, call it “s rms” for this discussion, is AN estimator of σ, the most-frequently-used estimator of σ, one of many/several estimators of σ.

t testing​

s rms is a biased = incorrect estimator of σ.

A Monte Carlo simulation of 400,000 estimates of s rms led to the table below.

S RMS is the average estimate of σ = 1; at n = 2, S RMS = .797, the S RMS estimate of σ = 1.

1 / S RMS is the correction factor; at n = 2, 1 / S RMS = 1.254. 1.254 * .797 = 1.

With n = 2, S RMS / 1 is .797 = 79.7% of σ. By n = 30, S RMS / σ is .992 = 99.2% of σ. The bias = error in S RMS is of most importance when n is small.

There is a t distribution for every n, where n > 1.

Each t distribution has, built into it, the value of the appropriate 1 /S RMS correction factor, and uses that to correct s rms to σ.

t test = (x̄ 1 - x̄ 2) / (s / √n)

s is one input to t test, s rms must be that input.

The t distribution corrects s rms to a closer/better estimate of σ; then solves for Z.

The result is that P (t test ≤ t) = P (Z test ≤ Z), which can only be true if

t test = (x̄ 1 - x̄ 2) / (σ / √n).

Then, the instruction: “Use the t test when n < about 30 and σ is unknown.” should read:

Use the t test when n < about 30 and s is s rms, s is an estimate of σ using the s formula above.

If any estimate of σ other than s rms is used, the t test result will be incorrect.

Z testing​

Use the Z test when n > about 30, σ is known, or n is large enough that s is an adequately accurate estimator of σ.

Z test = (x̄ 1 - x̄ 2) / (σ / √n)

σ = s rms * (1 /S RMS)

Then multiplying s rms by 1 / (S RMS)) corrects s rms to σ and means that Z testing is as accurate as t testing, even when n < 30.

Comments about the instruction: “Use the Z test when n > about 30, σ is known, or n is large enough that s is an adequately accurate estimator of σ.”

If σ is known and used; then for any n, P (Z ≤ Z test) = P (t ≤ t test), t test is not required.

If σ is known; then the < > 30 business can just go away.

If σ is estimated by s rms * (1 / S RMS); then P (Z ≤ Z test) = P (t ≤ t test), and a Z test is always appropriate and a t test is never required.

It is not true that Z testing is less accurate than t testing, at any n, P (Z≤ Z test) = P (t ≤ t test). What is true is that if s rms is used in Z testing, then, P (Z ≤ Z test) < P (t ≤ t test). Z testing requires knowing and using σ; using large n reduces but never eliminates the estimating error.

Conclusions​

If Gosset had this table; then he would not have had to calculate the many t distributions, get into degrees of freedom, t testing and the rules for Z and t testing. We could just cut that section out of the texts and Z test away.

s rms is one of many estimators of σ, there are several. One of my favorites is the range estimator; the range, at each n, on average, is “c” standard deviations wide. c varies with n, so range / c is an estimator of σ, an unbiased estimator of σ, call it “s r”. s r is easy to calculate and unbiased; but has a larger standard error than s rms. I am trying to make a sensible comparison.

#### Attachments

• 56.2 KB Views: 0
Last edited:

#### Dason

To summarize: it seems you think the only issue causing us to need to use the T distribution is that the sample standard deviation is a biased estimate of sigma and that your correction fixes that.

s being biased isn't the issue - it's that is random. All of your talk about replacing s with your new estimate which makes it equal to sigma just isn't correct.

Once again I ask - what is your math/statistics background? I wouldn't try using measure theory to explain something to a 5 year old. I need to know what level you're on because I'm not going to waste my time with responses consisting of theory you can't follow.

#### joeb33050

##### Member
"To summarize: it seems you think the only issue causing us to need to use the T distribution is that the sample standard deviation is a biased estimate of sigma and that your correction fixes that."
Wrong, you don't understand what I wrote. Do you understand that a t distribution IS corrected per the table, a Normal distribution? Do you understand that a Normal distribution with sigma known and used = a t distribution, at ANY n?
YOU don't understand.

#### Dason

Wrong, you don't understand what I wrote. Do you understand that a t distribution IS corrected per the table, a Normal distribution? Do you understand that a Normal distribution with sigma known and used = a t distribution, at ANY n?
YOU don't understand.
Nope. That isn't correct. And you haven't given any valid mathematical justification for why you think it even should be correct.

I don't think you want to discuss your background because it isn't particularly strong. There is nothing wrong with that but you really should be willing to entertain the idea that you might not be correct.

#### joeb33050

##### Member
"Nope. That isn't correct. And you haven't given any valid mathematical justification for why you think it even should be correct."
What isn't correct? How about: P (t </= t test) = P (Z </=Z test). Is that correct?

Last edited:
Status
Not open for further replies.