# How can I analyze a non-parametric dependent variable?

#### 0.05

##### New Member
I have a continuous dependent variable and two categorical predictors. I wanted to do a linear regression but the dependent variable does not meet the assumptions of normality in the residuals. Transformations do not work. Then, what options do I have to analyze this? I was looking into things like GAM, but I'm not familiarized with that. Is it OK to use GAM or are there other better approaches?

#### noetsi

##### No cake for spunky
Non parametric is a method not a variable (as you suggest). Normally it occurs when the distribution is not normal, than you use non-parametric methods. A variable could be used with parametric and non-parametric methods.

There are lots of options starting with the realization that if you have enough cases normality may not even matter because of the central limit theorem. How did you test your non-normality (and what form did it take). Data can be non-normal in many ways and solutions depends on how and why it is non-normal.

Which transformations did you try, Box Cox?

#### 0.05

##### New Member
Thank you for your clarification. I have almost 300 observations. I made a q-q plot of the residuals and run a couple of tests. It's slightly non-significant in the Kolmogorov-Smirnov and clearly non-significant with others (Shapiro and Anderson). I tried various transformations, but I'm adding 1 since y has zeros: e.g. log(y+1)

This is the distribution of the dependent variable:

#### noetsi

##### No cake for spunky
In all honesty that does not look that bad to me (I don't trust the standardized tests which get criticized a lot so I go totally with the qq plot). When you logged the data was the transformed data still non-normal? You might also have some extreme outliers.

You should ask a second opinion, I rarely worry about normality since I have thousands of points and so normality is not a great concern to me. Parametric methods are so much better supported and common that I would try non-parametric approaches only as a last concern.

#### noetsi

##### No cake for spunky
You might read this about the need for normality (which remember only impacts the p values)
http://rctdesign.org/techreports/arphnonnormality.pdf

Statisticians disagree, but I think it is common not to worry about non-normality with large sample sizes.

What statistic did you run to get residuals. I assumed the non-normality was in the raw data, but probably would have given the same advice.

If you ran regression I would look at your df beta and some measure of leverage. I suspect you have some extreme outliers.

Last edited:

#### 0.05

##### New Member
Thanks. Transforming the data doesn't produce any improvement. It doesn't seem to exist a problem with outliers. I agree it doesn't look that bad, but I wasn't sure because of the aggregation of observations around zero. I was also curious about how to do this analysis with a non-parametric test. For example, I know about Kruskal-Wallis, but I think this test only allows one factor and in this case I have two.

#### 0.05

##### New Member
The qq plot that I posted shows the residuals of a linear regression. I did a test for the outliers and was non significant ("testOutlier" in R's package DHARMa). Certainly, there are some outliers but my impression (maybe I'm wrong) is that they don't have too much influence in the model.

Boxplot of the dependent variable:

#### Karabiner

##### TS Contributor
Witth n=300, you do not need to worry because of the distribution of residuals.
Since you did not describe your topic, research questions, or study design, it is
not easy to tell if transformation might be useful for other reasons.

With kind regards

Karabiner

#### 0.05

##### New Member
The dependent variable is a pathologic index, which is the result of combining 4 more indices. It combines different types of metrics: measures, estimations ... so at the end the variable is continuous, but values can't be lower than zero, but usually zero is not that often, as you can see in the distribution. The predictors are treatment and age group.