Exploring non linear relationship between two variables

#1
I did a study measuring the relationship between metamemory and actual memory scores in a test. Basically what I find is that only those people who are good or bad in memory have an accurate metamemory. However, those in the middle of the distribution do not. I have analysed my results with simple correlations (weak but reliable correlation when all the sample is included) and dividing the sample in quartiles according to their scores in the objective memory test. Is there any way to analyse these data in a different way?
Thanks!
 

katxt

Active Member
#2
Do you have something you want to prove? Do you need a p value for a paper or report?
The graph of metamemory vs memory sounds U shaped. One common way to analyse this is to include a quadratic term.
 
#3
Thanks for your reply. Basically, my hypothesis was that only good and bad memorizers would have insights about their memory and this is what I found: the correlation was only significant for the 1 and 4 quartiles. However, one of the reviewers didn't like the approach of dividing the sample in quartiles, even that it is a common approach in this kind of research.
 

noetsi

Fortran must die
#4
loess are sometimes suggested for non-linear relationships, but personally I struggle with them. If you are interested he is an introduction.
https://en.wikipedia.org/wiki/Local_regression

It seems like you could create a dummy variable with 1 being good and bad memory and 0 being the middle although I have not seen that done so there may be problems.
 

katxt

Active Member
#5
Quartiles do have an arbitrary feel about them that tends to make statisticians feel a bit uncomfortable.
Could you perhaps post a graph of metamemory against measured memory, or perceived memory against memory (or both). kat
 

noetsi

Fortran must die
#6
You could but whether that is a good idea is outside my expertise. One thing I would do before investing a lot of time in this would be to look at how the journal handles similar analysis or talk to a coauthor if you have one or other expert in this field. I personally am not expert enough to advice you either in the graphs nor the topic. There may be an accepted way this is handled in your field.

Your predictor is not interval as far as I can tell so showing relationships is not simple. I would review graphical techniques for dummy variables which is what it seems to me you have as a predictor. You are in one of two states and whichever one of these two states you are in impacts the dependent variable. I am not sure you have a non-linear relationship, you seem to have a dummy predictor. I am not sure it even makes sense to talk about a non linear relationship if you have a categorical variable like this. Categorical variables are always linear in relationship to the dependent variable I believe (if your dependent variable is interval).

You might want to see what happens if you create a dummy variable and run linear regression if your dependent variable is interval.
 
#7
I think that I was inaccurate in the title of the post. The overall relationship is linear, but self-reported measures is a predictor only for good and bad memorizers. By the way, when I group quartiles 2 and 3, the relationship is still non-significant. In addition, I reanalyzed previously published reports and I obtain the same pattern.
Rplot.jpg
 
Last edited:

noetsi

Fortran must die
#8
If it is a categorical variable it has to be linear.
Can you compare the level that does have a relationship to the DV to those that does not (different levels of the same variable). It might useful to plot the three different levels versus the DV (for the whole data set not specific quartiles).
 

noetsi

Fortran must die
#10
I am not that experienced with categorical plots although they are done.

It might be interesting to start to plot a Tukey boxplot for each of the 3 levels on the level of metamemory (that is show a tukey boxplot for metameory first for bad, memory, 2nd for average memory, and third for good memory). There are lot of alternative ways to do that although a Tukey boxplot is very simple so its a good starting point.
 
#11
HI both, thanks for your reply. Actually, the data are not categorical. The y axes represents scores in the objective measure. The maximum score that someone can get is 80. Higher scores represent better memory. The x axis represents self-reported memory. This was measured with a likert-scale type questionnaire, with the items describing daily memory problems. The maximum score a participant can get is 100. Higher scores represent lower self-reported memory.
 

katxt

Active Member
#12
The y axes represents scores in the objective measure.
You are interested in how metamemory scores (DV) depend on objective memory scores (IV). Perhaps you should draw the graph with the objective score (IV) on the x axis. The regression lines now have a quite different look.