Assumptions multiple regression violated

I’m writing my thesis about the question: is ruminative thinking an independent risk factor for predicting marihuana use (controlled for sex and depression). In this study there are 300 participants of which 60 used marihuana. To investigate this relationship I wanted to do a multiple regression analyses. However, after analyzing the data I found that the following assumptions of the multiple regression are violated:

Linear relationship between (a) the dependent variable and each of your independent variables.



Normal distribution of residuals (errors).

Probably one of the problems is the way marihuana use (DV) is measured, which is: On how many days during the last 30 days did you use marijuana? (0-30 days). Because this study was for both marihuana and non-users, only 60 of the 300 participants said they used marihuana. Consequently, 240 participants answered 0 days, which is why the DV is far from being normally distributed. Should I transform the data in some way or should I use a non-parametric test? or is there an other option?

Thank you very much for every response

PS: I hope i give enough information to answer the question. If not, please tell me and i will provide it


Active Member
One approach is to divide the problem into two questions:

1. What factors predict any marijuana use.
2. What factors predict frequency of marijuana use among those who use marijuana.

The dependent variable in the first question is dichotomous, and so the question can be addressed by using, say, logistic regression. The dependent variable in the second question is continuous and may be amenable to ordinary linear regression.
Thanks for your response j58,

The first question is possible to perform. However, after analysing the data i found that for the second question i stumble on the same problems as before. About half of the people who smoke marihuana answer that they only smoked one or two times in the last 30 days, resulting in a non-normal restribution.

There is also still not an linear relationship between marihuana use and ruminative thinking




Normal distribution of residuals (errors).


Can i still peform a linear regression with these violations or will make this the interpretation invalid?


Active Member
Any discernable pattern in residuals vs predicted values indicates that the model does not fit. You will probably need to find a normalizing transformation for your dependent and/or independent variable, fit a more complex linear model, or abandon linear regression in favor a regression method better-suited to count data, such as Poisson regression.
Thanks for your response.

I don't know which transformation method i need to use to normalize the data, so i think i will use the posion regression such as you suggested. One more question if you don't mind. For my second research question i wanted to add the variable mindfulness and investigate which component of midnfullness explains the relationship between ruminative thinking and marihuana. Because mindfulness and ruminative thinking are strongly correlated i wanted to use the Baron & Kenny procedures for mediational hypothesis. Do you know if i can also use the poisson regression for mediation or should i use something different?


Less is more. Stay pure. Stay poor.
I didn't really read much of the above posts (just skimmed and skipped), but if you have zero heavy data for the dependent variable, variants of the Poisson may be useful (zero inflated, negative binomial, etc.).
Thanks for your response hlsmith,

When i follow this video
, and perform a One-Sample Kolmogorov-Smirnov Test. I find that the Asymp. Sig. (2-tailed)= 0,000, which means, according to the video, that the model doesn't follow a Poisson distribution. This means that the assumptions for performing a Poison regression is violated. Even when i analyse only the marijuana users, it still gives the same result.

Does this means another analysis has to be used or can i make it work?


Ambassador to the humans
That's not quite right. What you did would test the marginal distribution of the dependent variable to see if a poisson distribution doesn't make sense. What the poisson regression does is a conditional poisson distribution - so we wouldn't necessarily expect that the marginal distribution looks like a poisson. With that said I have doubts that a poisson would work great for this particular problem.

I would kind of be interested in seeing what a scatterplot looks like for the log of the dependent variable after you exclude the 0s.
Allright, then i didn't fully understand the video. Good for pointing that out.

After looking for different options i found out that due to over-dispersion the negative binonimal distribution is best to use as an alternative to the Poisson regression. Do you agree?

With a scatterplot for the log of the dependent variable after you exclude the 0s, i assumed you mean this.

This is without the log funtion

PS: Thanks for helping.