One Sample T-Test

#1
For an exercise is received speed measurements on two road points. The cars drove along an urban road (30mph) that opened into a rural road (60mph). The speed measurement of each car is taken 200 meters before (speed1) the end of the urban area and a second measurement 600 meters past (speed2) the end of the urban area.

I want to check the data if the cars that violate the speed on the urban road (30 mph) also tend to violate the speed on the rural road (60mph).

The speed1 is truly normally distributed, however speed 2 does not show a nice bell shaped histogram, but skewness and kurtosis are within limits. I recoded the data to test my question. So I'm not sure if I'm allowed to still do a One Sample T-Test:

First I copied only the speed1 violaters (>30mph) into a new variable.
Next I copied all the speed 2 values of the cars that violated speed1 into a new variable (speed 2 violated).
To prove that the cars who violated speed1 also tend to violate speed2 I was thinking of doing a One Sample T test over 'speed 2 violated' with 60mph as test value.

The One Sample T-test results show a significant difference to the test value (60mph):
t= 10.806
df=14
p<.001 (2-tailed)
Mean Diff=11.8

Is this a valid analysis to prove speed1 violators (30mph) also tend to violate speed 2 (60mph)?

Edit: In class we didn't discuss the One Sample Wilcoxon Signed Rank test, so I'm not allowed to use it. However this test also rejected significantly the Null Hypothesis, thus Speed2Over is significantly violated.
 
Last edited:

bryangoodrich

Probably A Mammal
#2
If you want to compare the two samples and you know which cars were speeding, then you'll want to do a paired (two-sample) t-test to compare whether those that were speeding differed or not between the two road segments.

To be clear, a one-sample t-test will simply tell you about that one sample. It is not correct to compare two samples by the results of their one-sample t-tests.
 

Dason

Ambassador to the humans
#3
To be clear, a one-sample t-test will simply tell you about that one sample. It is not correct to compare two samples by the results of their one-sample t-tests.
Although if the question is just whether those that sped for the 30 mph range will speed at the 60 mph range then it might be reasonable to use the approach they proposed. However this won't answer the question of whether those that speed at 30 mph range are different from those that don't speed at the 30 mph range when it comes to the 60 mph range. For that you would probably want to do a two sample t-test.

I don't think pairing is necessarily needed here if the 30 mph range is just used to create the groups. You lose some information but that might be alright given the questions of interest.
 

bryangoodrich

Probably A Mammal
#4
Well aren't you Mr Know-It-All! I guess I misread what his intent was, and actually looked at what he said a bit more closely.

I'd listen to this guy, Downburstx. He knows what he's talking about, unlike others (viz., me).
 
#5
Thanks you Bryan and Dason for the answers.

However I'm still wondering if I'm allowed to these parametric tests while speed 2 isn't nicely bell shaped (not normally distributed).

Any comments on that?

THanks in advance stat masters :)
 

Karabiner

TS Contributor
#6
I want to check the data if the cars that violate the speed on the urban road (30 mph) also tend to violate the speed on the rural road (60mph).
So you just have to calculate the relative frequency of speed violations (yes/no)
at 60mph in your subgroup of 30mph-speed violators. Or what exactely do you
mean by "tend to"?

Kind regards

K.
 
#7
So you just have to calculate the relative frequency of speed violations (yes/no)
at 60mph in your subgroup of 30mph-speed violators. Or what exactely do you
mean by "tend to"?

Kind regards

K.
With tend to, i mean: Do cars who violated 30mph (speed1) also significantly violate 60mph (speed 2)
My english isn't my first language, but I'll try to redefine my purpose:

I have a data set with car ID's. For every car is measured what their speed is at the 30mph road (speed1) and their speed afterwards on the 60mph road (speed2). I want to test only the cars who violate speed1 (30mph) also significantly violate speed 2.

What i did so far is:

IF speed1 is bigger than 30mph THEN copy speed 1 and speed 2 into new variables.
This result into a data set with only cars who violated the 30mph limit (speed 1 violated) in the first column and their associated measured speed 2 (which can be under 60 mph as well as over 60 mph).

Now I don't know how to test that cars in this new data set (who violate 30 mph) also significantly violate speed 2 (60 mph).

I was initially thinking of a One sample T-Test on the new speed 2 data set and comparing it against a test value of 60. This shows a highly significant (p<.001) difference (+8mph). However, since speed 2 is non-parametric I'm not allowed to do the analysis this way.

Any body an idea?

Karabiner, your comment made me think: Maybe I should interpreted my hypothesis as: Is there a correlation between the speed 1 violators and their associated speed 2, or something...
 

Karabiner

TS Contributor
#8
I have a data set with car ID's. For every car is measured what their speed is at the 30mph road (speed1) and their speed afterwards on the 60mph road (speed2). I want to test only the cars who violate speed1 (30mph) also significantly violate speed 2.
I guess you are not completely familiar with the exact meaning of statistical significance,
you didn't define what you by yourself understand under "signficant" here, either.
You have to translate vague descriptions into a precise statement. E.g. "I want to
test whether the mean speed of the cars is > 60mph" (that is what you did), or
"I want to test whether the proportion of 60mph violators in my subgroup is larger
than (say) 25%" or "I want to know if any of the subjects whio violated 30mph also
violated 60mph" (that would be easy).
Now I don't know how to test that cars in this new data set (who violate 30 mph) also significantly violate speed 2 (60 mph).
Again, What should "significant" mean? Statistical significance means that your sample data do
not agree with your Null hypothesis. So you have to state a Null hypothesis first (see examples above).
Karabiner, your comment made me think: Maybe I should interpreted my hypothesis as: Is there a correlation between the speed 1 violators and their associated speed 2, or something...
Wghy do you have to interpret your own hypotheses You can formulate
hypotheses as you wish. You didn't mention that there were restrictions.
Personally, I would find a 4fouldtable interesting (violoation 30mph yes(no
versus violation 60mph yes/no).

Kind regards

K.
 
#9
I'm indeed not really familiar with all the statistic tests.... :(

Anyway let's try it again....

I want to test with 95% certainty if the car's that violate the 30mph (speed1) on the urban road also tend to violate the 60mph (speed2) on the rural road. So do a test only on the associated speeds (speed2) on the car who violate 30mph at speed1. With this resulting data set i want to reject of the mean of the speed 2 is lower than 60mph and reject that it is equal to 60mph.

Most class mates just did a rho spearmans correlation on the data, but I can't understand how that would test my question.
 

Karabiner

TS Contributor
#10
Spearman indeed does not answer the question of whether subjects from your
subgroup violate the 60 mph limit. It just tells you whether those with relatively
higher (or lower) speed at the 30 mph limit tend to be those who have higher
(or lower) speed at 60 mph. Relative to other subjects, not relative to the limit.
So even if all subjects were below 30mph and 60mph, respectively, there could
nevertheless be a high correlation. You said you wanted to test the mean speed
against 60mph, and that's what you did. For a t-test with df=14 you need a sample
from a normally distributed population, which you do not have. Speed is often skewed.
Maybe if you log transform your data (newspeed=LOG(oldspeed) ) makes your data
more symmetric.

Kind regards

K.