How to pick the better model parameter in this example?

Let's say the model is fixed (e.g., a linear model where 5 coefficients need to be fit). Denote the parameter (coefficient) vector as [TEX]\beta[/TEX].

First, you are given 50 data (X,y) and use them to get a [TEX]\beta[/TEX], say [TEX]\beta_1[/TEX];
Then, you are given additional 50 data (so now you have 100 data) and you can use them to get another [TEX]\beta[/TEX], say [TEX]\beta_2[/TEX].

Which [TEX]\beta[/TEX] should you pick? And what is a good criterion and method?

I think with the second one fit by 100 data, RSS/n (training error) should be lower than the first one (is it correct?) However, I think training error is not a good evaluation metric and we should use test error. But what if in this case? Dividing the first group to (25,25) -- half used for testing and similar for the second group seems not so fair.

Anybody has a better solution? Thanks!


Fortran must die
You would not normally do a linear model where part of the data is used to estimate one parameter and a second set of data used to estimate a second parameter. If the data sets were equivalent it would be better to use more cases to estimate both parameters at once. If they weren't the same you could not use them to estimate Y in the same model.


Ambassador to the humans
I think they're talking about estimating the same parameters in both cases. Which is why I asked why they wouldn't want to use all of the data...


Fortran must die
Ah I thought they were creating two different parameters b1 and b2 and estimating them from different data sets.

You example makes a lot more sense.