Standard Multiple regression help

MJFlavell

New Member
Hi,
I am running a standard multiple regression for my dissertation but I am finding some of the output difficult to understand when it comes to checking if the assumptions for a multiple regression have been violated. I have never seen a scatterplot like the one below and do not know what it means for my assumptions or how to remedy it, no text book covers this either. Any advice for any of the outputs below would be much appreciated. If there is any information I can provide please let me know and I will do so.

noetsi

Fortran must die
Are your predictor variables ordinal. And what is your dependent variable - how is it coded.
I am guessing your dependent variable is not interval which is a problem if true given you appear to be running linear regression.

You don't have a lot of data given what your standardized predicted variables show. Why so few data points? Having a small sample size makes violations of your assumptions much worse and creates power issues.

If a PP plot is similar to a QQ plot (I am not familiar with a pp plot) then your data is not normally distributed.

MJFlavell

New Member
Only 83 people did my survey and some had to be removed for various reasons. I know its not ideal but its what I've got to work with. My predictor variables are participant's total scores on a risky behaviour scale that I had to come up with with my supervisor and gender. My outcome variable is the participant's total score on a fear of crime scale.
Any recommendations going forward?
Thank you for replying by the way, much appreciated

noetsi

Fortran must die
What is the range of your dependent variable. From what to what.

Violations of your assumptions won't impact the slope. They will influence the statistical test. There are a range of solutions for violation none are easy to do. Transformations of the data is the most common way to deal with the problem.

noetsi

Fortran must die
If you have 15 distinct levels its probably interval like although there is no agreement on that concept (the literature I have seen suggest 7 distinct levels usually is ok in practice). Your major issue will be having so few cases.

I don't think you have normally distributed data so your confidence levels will be doubtful. I think you just have to say that. There are possible solutions like gathering more data, transformations and the like (possibly bootstrapping).