# Conjecture: "SST=SSR+SST does not imply y(x) is an optimal sLR-line" (sLR:=simple Linear Regression)

#### Hermann

##### New Member
Let y(x) = a + bx be a regression-line with a=intercept and b=slope, with b <> 0 then:
I can proof: y(x) is an optimal regression ==> SST = SSR + SSE (*)
Sketch of Proof: Proof that SST=SSR+SSE under the condition: y(x) is optimal <=> dR²/da=0 (1) and dR²/db=0 (2) for y(x).
(R² := 1 -SSE/SSR "R-Squared"). Then the conditions (1) and (2) let to an equation which proves (*)...

My idea: The proof of the above conjecture is “not easy” and maybe a little hard to understand. Easier would it be to construct a counterexample.
To construct a counterexample: define a Training Set TS= {observation-points}; a sLR-line which has condition (*), but is not an optimal sLR-line.

My question: Exist an easy conterexample? Or any other ideas?

#### Hermann

##### New Member
Thanks. But this is the direction of the proof: (y(x) is an optimal SLR) ===> (SST = SSR + SSE) (in words: " the left statement implies the right statement". This is shown in your remark. I have seen this and it is clear for me (Thanks, for your confirmation!!!).
But, I am interested in the other direction. In words: "From (SSt = SSR + SSE) follows (y(x) is an optimal sLR). I think. this is wrong!! So for me, as long as I can not prove it, it is a conjecture. So I am looking for an easy counterexample to make a theorem out of this conjecture. Did you understand my problem? Did you think, I'm right with my conjecture? Or need you some more details? PS. Sorry, for my "poor" English, but I'm not a native English speaker.

#### fed2

##### Member
Im thinking this has the flavor of gauss-markov theorem. So if optimal is 'BLUE', then the least squares estimate, for which * holds is optimal when the data are homogenous variance. But it would be not true when the y are not homo-skedastic (scedastic?). so that's a counter example i reckon.

#### Dason

##### Ambassador to the humans
I think OP needs to define what they mean by 'optimal regression'

#### Hermann

##### New Member
An optimal simple Linear Regression line, is per Definition a line for which R² is maximal. R² := 1- SSE/SST (well known R-Square condition).

#### Dason

##### Ambassador to the humans
Ok. That is equivalent to what I would consider the more well known condition of minimizing SSE.

#### fed2

##### Member
my spider sense is telling me that this definition of optimality may be problematic? That is, doesn't it create an automatic tautology between optimality and least squares? ie you have defined optimality to be equivalent to least squares (if and only if)?

#### Hermann

##### New Member
Hello fed2: I use the notation "a line is an "optimal" simple Linear regression (sLF) - line if R² is maximal (which is the same as SSE is minimal). So this is just another formulation of the R² condition. I use the R² to calculate with the "Least Square Fit" method the parameters a, b of an optimal sLR-line y(x) = a +b*x (a=intercept and b = slope of the line).
So I think, I use the correct definitions... (guess there is not tautology..).
I can prove: (optimal sLR) ===> SST=SSR+SSE. My question/problem is: Is the other direction "<===" true or false. Did you understand? Who can answer the question? Any ideas?

#### fed2

##### Member
hmm maybe it isn't. I think the reverse implication you are referring to is essentially a statement of pythagoras' thrm

"If in a triangle the square on one of the sides equals the sum of the squares on the remaining two sides of the triangle, then the angle contained by the remaining two sides of the triangle is right."

try workin' that, should show that it always holds, or im wrong, one of those.