How to interpret Lilliefors test?

#1
Hi everyone,

I'm doing a study on about 2000+ participants, and would like to test for normality of some continuous variables (I know from the histogram my age distribution is bimodal). I applied the Lilliefors test in R and got a p-value of <0.0001, but I'm rather confused about how to interpret this - this link suggests a low p-value confirms normality https://towardsdatascience.com/6-wa...al-distribution-which-one-to-use-9dcf47d8fa93 but everywhere else I read, a low p-value confirms non-normality. I wonder if the author got it wrong.

Thanks!
 

Karabiner

TS Contributor
#2
Why do you want to perform statistical tests of whether a sample of 2000 is sampled from a normally distributed population? That information seems quite usesless.

but everywhere else
Everywhere Else ist right. The Null hypothesis is that the population from which the sample is drawn is normally distributed. p < 0.0001 rejects that hypothesis.

With kind regards

Karabiner
 
#3
Thanks Karabiner! How is it not useful? I'm analysing many subgroups, but at least 400 in each, and each group has a different distribution pattern - a few are bimodal, a few others are slightly skewed, and at least one doesn't seem to have a pattern, so I'm trying to objectively determine normality for some of the slightly skewed ones to see if I need to use parametric or non-parametric tests. I also read on StackExchange that if my sample size is large enough (above 30?) then I can apply t-tests on non-normal distribution. Is this true? I experimented with one variable (age ~ ethnicity) and whether or not I correct for normality makes a huge difference to my p-values (<0.001 versus 0.07).
 

Karabiner

TS Contributor
#4
T each group has a different distribution pattern - a few are bimodal, a few others are slightly skewed, and at least one doesn't seem to have a pattern, so I'm trying to objectively determine normality for some of the slightly skewed ones to see if I need to use parametric or non-parametric tests.
The distribution of the dependent variable (in the population) does not need to be normal.
Some analyses assume the error distribution (the residuals) to be normal in the population.
But even this assumption is not relevant if sample size is large enough
(n > 30 or so, see central limit theorem). With n > 400 you certainly needn't care.

By the way, n > 400 will almost always reject the "normally distributed population" assumption,
regardless of whether the deviation were small or large; so, not only is normality irrelevant, but
also such tests of significance are useless.

With kind regards

Karabiner