R-squared is too high

#1
I just want to ask on how to make the R-square decrease without ruining the model. My research is about economics and the R-squared is said to be too high.
 

Karabiner

TS Contributor
#2
So maybe you could tell us something about the topic and the research question, the variables included,
the actual R² of your model, the sample size, and who said that to you and why?

With kind regards

Karabiner
 
#3
Hello! Thank you for replying. Im an undergrad writing for thesis. The topic is about human capital. The variables included are GDP per capita being the dependent var, life expectancy completion rate of higher ed and techvoc being the independent var from 2000 to 2018. We are testing their relationship. the R² is 0.996134. It was my thesis adviser who told me that this is too high. She said that only 0.1% of unexplained variability is impossible. she is deadset too in making the R² lower.
 
Last edited:

Karabiner

TS Contributor
#5
Ok, what does she suggest as the reason for this extremely large R², and what does she suggest as remedy against this?

You did not report your sample size (important), number of variables (important), and what the model looks like (important).

Please explain your research question, your regression model, number of variables, sample size.

With kind regards

Karabiner
 
Last edited:

noetsi

No cake for spunky
#8
It is possible that you are conducting an analysis on two variables that are both moving in time. That can generate artificially high slopes and I assume R values. I think, although I have not personally encountered this, that having too many predictors relative to your sample size might also be an issue
 

spunky

Can't make spagetti
#10
Hello! Thank you for replying. Im an undergrad writing for thesis. The topic is about human capital. The variables included are GDP per capita being the dependent var, life expectancy completion rate of higher ed and techvoc being the independent var from 2000 to 2018. We are testing their relationship. the R² is 0.996134. It was my thesis adviser who told me that this is too high. She said that only 0.1% of unexplained variability is impossible. she is deadset too in making the R² lower.
From 2000 to 2018? What are you doing to model the time dependency?
 

noetsi

No cake for spunky
#11
If two time series, that is two data sets are moving in time and you don't address that you will get absurd R square values that mean nothing. That is the point I was trying to make earlier.

There are no simple solutions for that unfortunately. I don't even pay attention to R square. I don't think it receives emphasis in the literature generally.

The real issue is dealing with the time dependency. R square just reflects it.
 

noetsi

No cake for spunky
#15
Yes, but any real data involving humans is never R² = 0.996134, even correlating with time.
In fact real data that moves in time compared with real data that moves in time can generate extremely high R squared values. That is how you know there is a problem. Time is highly correlated with itself and that is what you are measuring. Not anything substantive.

That is why time series regression is used.