Correlation and Scatterplot

#1
Hello all, I'm in a bit of a quandary with a psychology paper. My two variables are self-esteem and perceived body image dissatisfaction (PBID) and we've been asked to explore the correlation between them. I'm more or less done - the data is non-normally distributed (based on standardised skewness and Shapiro-Wilk), so I ran Spearman, which revealed a moderate correlation.

To my question - I'm not sure whether to go with a linear or a curved line for the scatter plot. My understanding is that non-normal distribution of data usually entails a non-linear relationship, and hence a curved line, but admittedly, I'm not sure of my grounds here. That's why I ask - based on the pictures below, would you go with a linear or a non-linear relationship?

image (1).png image.png
 

hlsmith

Not a robit
#2
First step, after fitting this line fit then plots residuals and look for homogeneity or depart of linearity in the residuals. What did you use to get the line? If it was a linear model with and without a quadratic term, I believe you could compare these nested models using an F-test.

P.S., I cannot really teal a great difference between model fit with the naked eye. Parsimony says to use the simpler model given negligible differences.
 
#3
Is there a reason to assume that there is a simple mathematical formula that connects the two variable? If not, the you can simply say that there is a moderate correlation and that increasing values of self esteem are associated with smaller shape sums. Then draw a graph with a locally weighted regression line through it to show the trend.
If you really feel the need to have a mathematical regression line, then the stretching out of the points towards the top of the graph indicates that an exponential trend line might work.
 
#4
Is there a reason to assume that there is a simple mathematical formula that connects the two variable? If not, the you can simply say that there is a moderate correlation and that increasing values of self esteem are associated with smaller shape sums. Then draw a graph with a locally weighted regression line through it to show the trend.
If you really feel the need to have a mathematical regression line, then the stretching out of the points towards the top of the graph indicates that an exponential trend line might work.
I very much appreciate the replies. To answer your question; no, there's no reason to assume a simple mathematical formula. But we've been asked to note the presence of linearity (or otherwise) on the scatter plot. I'm having a hard time assessing whether or not the data on this scatter plot could be considered linear; that is, if the first graph with the straight line on it is viable. Maybe it's just about linear, but it's borderline, and there are a bunch of outliers.

So to my next question: here's the result section of my paper. Does this look more or less defensible, especially if I go with the linear scatter plot?

Thank you,
Aufbau83

Results:
221 individuals recruited by opportunity sampling were surveyed about their level of self-esteem (M = 3.83, SD = 0.7) and their level of current and recent Concern about Body Shape (M = 20.6, SD = 9.3). A scatterplot (Figure 3) indicated the presence of a monotonic, but not a conclusively linear relationship between the two variables. Furthermore, histograms (Figures 4 and 5) indicated a non-normal distribution of data for both variables, which was confirmed by standardized skewness coefficients (self-esteem = -2.3; body shape = 5.6 [normal range = -1.96 to +1.96]), and by a Shapiro-Wilk test (Self-esteem: W(221) = 0.98, p = .002 ; CABS: W(221) = 0.93, p = 0.00). Due to the abnormal distribution of data and the lack of a definitively linear correlation, it was decided to perform a nonparametric procedure, the Spearman’s rank order correlation coefficient. This confirmed a moderate negative correlation between self-esteem and CABS (Rs = –0.4, p < .05). The null hypothesis can thus be rejected, and the alternative hypothesis accepted.
 
Last edited:
#5
Is there a reason to assume that there is a simple mathematical formula that connects the two variable? If not, the you can simply say that there is a moderate correlation and that increasing values of self esteem are associated with smaller shape sums. Then draw a graph with a locally weighted regression line through it to show the trend.
If you really feel the need to have a mathematical regression line, then the stretching out of the points towards the top of the graph indicates that an exponential trend line might work.
PS. I get what you're saying about (respectively) a locally weighted regression line or an exponential trend line. Does this mean you're convinced that a linear correlation here is unsustainable? It has to be curved for best fit? Or could I perhaps get away with it?
 
#6
I personally would be surprised if there really was an underlying quadratic relationship. They are quite rare in real life. What would the coefficients mean in clinical terms?
At least with the linear graph you can say things like "on average, each extra unit of self esteem reduces the body image sum by about 6 points" and "the effect is not as marked at high esteem values". (The graph looks like it is flattening.)
As hlsmith says, there are formal tests for whether the quadratic term is significant. More simply, perform the regression with X and X^2 and look at the p value for X^2, but beware that p values can't be trusted unless the residuals are normal. Did your lecturer indicate that this level of statistics would be needed?
As a complication, does self esteem cause body image?
 
#8
I personally would be surprised if there really was an underlying quadratic relationship. They are quite rare in real life. What would the coefficients mean in clinical terms?
A very good question, and this is a matter for the discussion section of the report. I find a moderate negative correlation between self-esteem and "concern about body image" - as most previous studies have - but I try to problematise it, not only because the scatterplot isn't exactly conclusive (there are a bunch of outliers and the correlation is far from perfect), but also because both concepts are a little... nebulous.

At least with the linear graph you can say things like "on average, each extra unit of self esteem reduces the body image sum by about 6 points" and "the effect is not as marked at high esteem values". (The graph looks like it is flattening.)
Thanks for this - I integrated this idea into my paper. And yes, I went with the linear graph. To answer your question, it's an undergraduate psychology degree, and there's no indication that the level of statistics you identify is needed.

That said, I still don't fully understand how to determine if a scatter plot is linear, but maybe it's not about hard and fast rules, but rather about what you judge to be the most effective means of presenting information. Are scatter plots purely visual aids, then?

I hate p-values, but for the other stats you provide p-value values and for spearman you present the extra yucky looking < .05.
That's precisely what I had to do :-(
 

Karabiner

TS Contributor
#9
More simply, perform the regression with X and X^2 and look at the p value for X^2, but beware that p values can't be trusted unless the residuals are normal.
Fortunately, n is large enbough (n=221), so the normality assumpmtion is not relevant here.

With kind regards

Karabiner