T-test applicable with exponentially distributed samples?

axs

New Member
#1
Hi

I have a few datasets of quality ratings for different producers (i.e. x1 ratings for producer y1, x2 ratings for producer y2 etc). The ratings are exponentially distributed (as shown by the histogram). The data sample sizes vary between 1000 and 100,000.
The differences between means of these data sets are relatively small, so I would like to test whether these differences are statistically significant. With my basic knowledge of statistics, I would assume that a t-test would be a good test statistic for that. However, most (but not all) texts on t-test usually assume normally distributed data. Is t-test applicable in my case, where i have exponentially distributed data? If not, what would be the appropriate test.
 
#2
You don't need your data to follow a normal distribution, just the test statistic. In this case you are comparing means, which we know are normally distributed by the central limit theorem. So you can use a t-test if you would like.

Since your populations are non-normal, your test may be underpowered. You may want to consider a non-parametric test such as Mann-Whitney:

http://en.wikipedia.org/wiki/Mann-Whitney_U_test
 

axs

New Member
#3
Thanks a lot for the useful answer. One more question: assume I want to compare medians (since these might make more sense than means in case of exponentially distributed data). Can the t-test or U-test be used in this case? If so, why?
 
#4
You cannot t-test the medians. As far as I know, there is no nice form for the sampling distribution of the median. The reason you could do a t-test on the means even though they arise from non-normal data is that when average all of your data points, the mean itself is normally distributed by the CLT.

Given you have such large sample, you could bootstrap the sampling distribution and perform the test. Alternatively you could use the U-test here again.


Hope this helps.
 
#5
You don't need your data to follow a normal distribution, just the test statistic. In this case you are comparing means, which we know are normally distributed by the central limit theorem. So you can use a t-test if you would like.

Since your populations are non-normal, your test may be underpowered. You may want to consider a non-parametric test such as Mann-Whitney:

http://en.wikipedia.org/wiki/Mann-Whitney_U_test
Sorry for hijacking the thread - if I used t-tests to reduce a linear model but my sample distribution was not normal, would the CLT still apply?