significant r but assumptions not met

gianmarco

TS Contributor
#1
Dear All,
issue here stems from the one discussed in a previous post of mine (http://talkstats.com/showthread.php?t=13844).

I have a dataset comprising 8 object, on which measurements where taken (cm). I am seeking to verify if a linear relation is likely to exist between the two variables. The sample is admittedly small.

I get an r value of 0,96, and p=0,001. So far so good (?). But the inspection of the standardized residual plot suggests that non-linearity could be present (if I get it right what the scatterplot is actually suggesting).

May be the whole thing is weakened by the small sample analysed.

Unfortunately, these data are all that I have at hand.

So, the question is:
1) the discrepancy between the r value and the non-linearity (?) (as revealed by the scatterplot of the residuals) is due to the very limited amount of data?

2) is it correct to say that there is not sufficient evidence for devising a linear relation between the variables?

3) has the use of bootstrapped r any sense in this context?

Thanks for any help provided,
Gm
 

Dason

Ambassador to the humans
#2
I don't see how you're getting nonlinearity from the residual plot. You might be worried about decreasing variance but it looks linear to me. Also variance issues are quite hard to detect with such small sample sizes.

However the real concern to me is that you might not have as strong of a relationship as you think because of the quite large value. It has a lot of leverage and really changes the prediction line. If you take that point out (reducing your data set down to 7 points... yikes) and rerun the analysis that could help see what's going on. It's possible that the line changes quite a bit.
 

gianmarco

TS Contributor
#3
Hi Dason,
thanks for your prompt reply.

On the one hand, your words agree with what I said about the issue of small sample size. I will check the CI for population r: I suspect that it will comprise 0.

As for my understanding of the scatterplot of the residual, I am relying upon literature I read, saying that in simple linear regression the scatterplots of both normalized residuals vs x values and normalized residuals vs predicted y should show a random scatter of points.



Best Regards,
Gm