How can I analyze a non-parametric dependent variable?

0.05

New Member
#1
I have a continuous dependent variable and two categorical predictors. I wanted to do a linear regression but the dependent variable does not meet the assumptions of normality in the residuals. Transformations do not work. Then, what options do I have to analyze this? I was looking into things like GAM, but I'm not familiarized with that. Is it OK to use GAM or are there other better approaches?
 

noetsi

Fortran must die
#2
Non parametric is a method not a variable (as you suggest). Normally it occurs when the distribution is not normal, than you use non-parametric methods. A variable could be used with parametric and non-parametric methods.

There are lots of options starting with the realization that if you have enough cases normality may not even matter because of the central limit theorem. How did you test your non-normality (and what form did it take). Data can be non-normal in many ways and solutions depends on how and why it is non-normal.

Which transformations did you try, Box Cox?
 

0.05

New Member
#3
Thank you for your clarification. I have almost 300 observations. I made a q-q plot of the residuals and run a couple of tests. It's slightly non-significant in the Kolmogorov-Smirnov and clearly non-significant with others (Shapiro and Anderson). I tried various transformations, but I'm adding 1 since y has zeros: e.g. log(y+1)
qq.png
This is the distribution of the dependent variable:
distr.png
 

noetsi

Fortran must die
#4
In all honesty that does not look that bad to me (I don't trust the standardized tests which get criticized a lot so I go totally with the qq plot). When you logged the data was the transformed data still non-normal? You might also have some extreme outliers.

You should ask a second opinion, I rarely worry about normality since I have thousands of points and so normality is not a great concern to me. Parametric methods are so much better supported and common that I would try non-parametric approaches only as a last concern.
 

noetsi

Fortran must die
#5
You might read this about the need for normality (which remember only impacts the p values)
http://rctdesign.org/techreports/arphnonnormality.pdf

Statisticians disagree, but I think it is common not to worry about non-normality with large sample sizes.

What statistic did you run to get residuals. I assumed the non-normality was in the raw data, but probably would have given the same advice.

If you ran regression I would look at your df beta and some measure of leverage. I suspect you have some extreme outliers.
 
Last edited:

0.05

New Member
#6
Thanks. Transforming the data doesn't produce any improvement. It doesn't seem to exist a problem with outliers. I agree it doesn't look that bad, but I wasn't sure because of the aggregation of observations around zero. I was also curious about how to do this analysis with a non-parametric test. For example, I know about Kruskal-Wallis, but I think this test only allows one factor and in this case I have two.
 

0.05

New Member
#7
The qq plot that I posted shows the residuals of a linear regression. I did a test for the outliers and was non significant ("testOutlier" in R's package DHARMa). Certainly, there are some outliers but my impression (maybe I'm wrong) is that they don't have too much influence in the model.
outlev.png
dffits.png
Boxplot of the dependent variable:
boxplot.png
 

Karabiner

TS Contributor
#8
Witth n=300, you do not need to worry because of the distribution of residuals.
Since you did not describe your topic, research questions, or study design, it is
not easy to tell if transformation might be useful for other reasons.

With kind regards

Karabiner
 

0.05

New Member
#9
The dependent variable is a pathologic index, which is the result of combining 4 more indices. It combines different types of metrics: measures, estimations ... so at the end the variable is continuous, but values can't be lower than zero, but usually zero is not that often, as you can see in the distribution. The predictors are treatment and age group.