Rules of thumb with normal and non-normal distributions

#1
Hi,
I am hoping someone can clarify for me a rule of thumb that I was taught for choosing the correct statistical test based on the distribution. For context, this rule was used as part of statistics lesson in a high school ecology class.

I was told that if the data was measured ( eg height, weight, length), I could assume that the data will be normally distributed. If the data was counted ( frequency), I could assume the data would be non-normally distributed. Based on this alone, I would choose the corresponding parametric or non parametric test.

I can't seem to find any information online that backs up this assumption. Could someone please let me know if this makes sense?

Any information on a simple way to calculate normality using basic Excel skills would also be greatly appreciated.

Many thanks :)
 

katxt

Active Member
#2
One way of checking normality is by eye, checking to see if the the normal probability plot is straight. Unfortunately Excel doesn't do this naturally. If you are confined to Excel, the attached spreadsheet gives a simple way. Dump your data into column A and see if the resulting graph is more or less straight. If so, the data is normal enough. Google "normal probability plot" images for the interpretation of obviously bent graphs.
 

Attachments

Miner

TS Contributor
#3
This rule of thumb is vastly oversimplified and is only true in a few simple cases.

While it is true that count data is technically not normally distributed (it follows the Poisson distribution), it can be treated as if it were approximately normally distributed when the counts are larger (see normal approximation for the Poisson distribution). The same holds true for binary data which follow the binomial distribution (see normal approximation to the binomial distribution)..

The same hold true for the reverse. Measured (continuous) data that are subject to random influences will often be normally distributed. However, measured data that are subject to nonrandom influences, or are bounded are not normally distributed. There are numerous distributions (e.g., lognormal, exponential, logistic, extreme values, weibull, etc.) that apply instead. Measurements of time, which are bounded by zero and sometimes subject to human behavior (deadlines and procrastination) often follow the lognormal distribution.

Rules of thumb only work for extremely simple situations. My guess is that your teacher didn't want to overwhelm you in high school. However, they did you a disservice by confusing you now that you want to learn more.
 

noetsi

Fortran must die
#4
since they have these things called computers just run a qq plot and you don't need to eye it :)

Normality is way less important anyhow than some think. It only influences, with small sample sizes usually, the confidence interval. And its doubtful how many real world data sets are normal.