Isolating an effect for a distribution

Embee

New Member
Hello stats gurus, I'm working on a project that has been perplexing me for a bit...

I have 24 histograms (one for each hour of the day). They are relatively normal in shape except that the size of the tails, and the skewness/kurtosis change throughout the day. There are several variables that cause these changes but what I'm trying to do is isolate the effect that just one variable (traffic) has on the shape of the histogram. I have run linear regression analyses and the R^2 value I came up with when running the traffic variable against the kurtosis value was .07 ...not very impressive. Running traffic against skewness resulted in an R^2 of only .03.

I'm wondering if my procedure of using regression makes sense or if I should be doing something totally different. My ultimate goal is to "correct" for the traffic variable in the future (i.e. accurately remove its effects) but right now I'm not confident as to how to go about doing that. Any feedback would be most appreciated.

JohnM

TS Contributor
Try correlating "traffic" with other aspects of the distribution, other than skewness / kurtosis:

- the mean or median

- the variance or standard deviation

- the proportion of the curve/histogram:
(a) above a certain value(s)
(b) below a certain value(s)
(c) in between certain values
(d) outside of certain values

Embee

New Member
Thanks John, I was thinking along the same lines. I'm trying to isolate the effect that traffic has on road temperature. Whenever a car goes by, the road temp will change, thus during rush hour the temperature histogram is more spread out then it is during an hour in the middle of the night.

I did a regression analysis for traffic count versus interdecial range of the histogram and found an R^2 of 0.1.

What I'm not sure of is
a) is that method appropriate?
b) how to interpret the result to learn exactly what effect traffic has on temperature - so I can correct for it in the future.

JohnM

TS Contributor
So the traffic volume isn't strongly correlated with the variance or the standard deviation?

Another variable you might try to correlate with traffic volume is the coefficient of variation, which is the standard deviation divided by the average, expressed as a percentage:

%cv = sd/mean

This gives a standardized measure of the variation when you need to compare it across groups of data with different means.

Embee

New Member
Well, I'm thinking since some of the histograms are a little skewed it's not appropriate to use variance or standard deviation (or maybe I'm wrong though). So, you're saying something like the interdecial range isn't a good thing to use?

JohnM

TS Contributor
Well, the r^2 was only 0.1, so it's not explaining a heck of a lot.....

Embee

New Member
True, although I just ran a regression with standard deviation and got an r^2 of 0.11 and the one with the CV was even less.

But the thing is, I know traffic only has a slight effect (there are many other factors affecting road temperature) so maybe this does make sense after all? Do you think?

JohnM

TS Contributor
That's probably true - maybe it's a combination of air temperature and traffic volume.