# R-squared is too high

#### LoulouVille

##### New Member
I just want to ask on how to make the R-square decrease without ruining the model. My research is about economics and the R-squared is said to be too high.

#### Karabiner

##### TS Contributor
So maybe you could tell us something about the topic and the research question, the variables included,
the actual R² of your model, the sample size, and who said that to you and why?

With kind regards

Karabiner

#### LoulouVille

##### New Member
Hello! Thank you for replying. Im an undergrad writing for thesis. The topic is about human capital. The variables included are GDP per capita being the dependent var, life expectancy completion rate of higher ed and techvoc being the independent var from 2000 to 2018. We are testing their relationship. the R² is 0.996134. It was my thesis adviser who told me that this is too high. She said that only 0.1% of unexplained variability is impossible. she is deadset too in making the R² lower.

Last edited:

#### katxt

##### Well-Known Member
She said that only 0.1% of unexplained variability is impossible.
I agree with her. Only the most precise physics experiments can be up there.
It's almost as if you used one of the DVs twice - once as a DV and again as the IV.

#### Karabiner

##### TS Contributor
Ok, what does she suggest as the reason for this extremely large R², and what does she suggest as remedy against this?

You did not report your sample size (important), number of variables (important), and what the model looks like (important).

Please explain your research question, your regression model, number of variables, sample size.

With kind regards

Karabiner

Last edited:

#### fed2

##### Active Member
you can try lowering r squared by adding random numbers as a predictor. hope that helps, fed2.

#### katxt

##### Well-Known Member
Or perhaps you included the DV as one of your predictors by mistake?

#### noetsi

##### No cake for spunky
It is possible that you are conducting an analysis on two variables that are both moving in time. That can generate artificially high slopes and I assume R values. I think, although I have not personally encountered this, that having too many predictors relative to your sample size might also be an issue

#### katxt

##### Well-Known Member
How about posting your data as a text file.

#### spunky

##### Can't make spagetti
Hello! Thank you for replying. Im an undergrad writing for thesis. The topic is about human capital. The variables included are GDP per capita being the dependent var, life expectancy completion rate of higher ed and techvoc being the independent var from 2000 to 2018. We are testing their relationship. the R² is 0.996134. It was my thesis adviser who told me that this is too high. She said that only 0.1% of unexplained variability is impossible. she is deadset too in making the R² lower.
From 2000 to 2018? What are you doing to model the time dependency?

#### noetsi

##### No cake for spunky
If two time series, that is two data sets are moving in time and you don't address that you will get absurd R square values that mean nothing. That is the point I was trying to make earlier.

There are no simple solutions for that unfortunately. I don't even pay attention to R square. I don't think it receives emphasis in the literature generally.

The real issue is dealing with the time dependency. R square just reflects it.

#### katxt

##### Well-Known Member
The R2 value is so high, is seems that a gross mistake is far more likely than any statistical explanation.

#### noetsi

##### No cake for spunky
The R2 value is so high, is seems that a gross mistake is far more likely than any statistical explanation.
It is what one would expect when correlating two series that are moving in time. You are correlating time with time.

#### katxt

##### Well-Known Member
Yes, but any real data involving humans is never R² = 0.996134, even correlating with time.

#### noetsi

##### No cake for spunky
Yes, but any real data involving humans is never R² = 0.996134, even correlating with time.
In fact real data that moves in time compared with real data that moves in time can generate extremely high R squared values. That is how you know there is a problem. Time is highly correlated with itself and that is what you are measuring. Not anything substantive.

That is why time series regression is used.