proportion of explained variance

#1
Hello everyone,

I am studying psychology and I have set for myself the aim to get an intuitive understanding of what I learn in stats. It worked out well in the first year however now I got to models like linear regression and feel like I don't fully understand what's going on anymore. I can perform things using software and use jargon but why it is possible to do certain things I can not really explain for myself and I really hate that !

One of the terms I can not really make sense of is "proportion of explained variance" It comes up in the context of multiple linear regression or factor analysis for example. Let's say the term indicates how much of the variance in y is explained by the predictor variable x.

Can you give me a practical example what is meant by "explained" ? I imagine sth like a column of y scores and an adjacent column of x scores. Would 100 % "explained variance" imply that when x for example doubles y doubles too or how can I imagine the relation between x and y when one talks about explained variance ?

I feel stupid and would be thankful for someone to help me with this and maybe other questions.
Cheers !
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I start be providing a basic piece of information, if you are conducting 'simple' linear regression the r^2 actually equals the Pearson correlation coefficient squared. It is literally the r (i.e., correlation) squared. So when X the association between X and Y. It is good to try and make sense of these things and many are not that transparent.
 
#3
Alright thank you already I didn't have that point in mind.

My understanding of a positive high r-value would be that it describes the frequency with which relatively high scores on x (compared to the mean of x ) tend to occur in cases in which also relatively high scores on y occur (compared to the mean of y).

If the proportion of explained variance is just r squared it can not add anything conceptually to this meaning of r ?
Is that what you propose or do I miss out on something ?
 

obh

Active Member
#4
Hi Isabatt,

I think that the "proportion of explained variance" is the best definition for R squared :) (at least when not forcing the constant to be zero)

If you have data with no predicators the best prediction value will probably be the average of Y:
The regression predicted value: ŷi

You compare the observed value yi to the Y average y̅: yi-y̅

Part of this difference is explained by the regression model: ŷi - y̅
The part that is not explained by the model is: yi - ŷi

SST = Σ(ŷi - y̅)^2
SSR/n: is exactly the formula of Y variance.

SSR = Σ(ŷi - y̅)^2
SSR/n: you can say that this is the variance of the explained part, the estimate Y, say the explained variance.

R Squared= SSR/SST = (SSR/n) / (SST/n) = Explained variance/ Variance

Is that make sense?

Diagram
http://www.statskingdom.com/doc_linear_regression.html#regression_sst_ssr
 
Last edited:
#5
Thank you, I will digest that for a moment.

When I want to express the meaning of a "perfect" R2 =1 in my words it means that the predicted scores that our model provides us deviate to the same amount from the mean of y as the observed scores do ?