First, you are given 50 data (X,y) and use them to get a [TEX]\beta[/TEX], say [TEX]\beta_1[/TEX];

Then, you are given additional 50 data (so now you have 100 data) and you can use them to get another [TEX]\beta[/TEX], say [TEX]\beta_2[/TEX].

Which [TEX]\beta[/TEX] should you pick? And what is a good criterion and method?

I think with the second one fit by 100 data, RSS/n (training error) should be lower than the first one (is it correct?) However, I think training error is not a good evaluation metric and we should use test error. But what if in this case? Dividing the first group to (25,25) -- half used for testing and similar for the second group seems not so fair.

Anybody has a better solution? Thanks!