Why do degrees of freedom matter?

I work with Generalised Linear Models and we use use tests such as chi squared test and the AIC test to compare how good models are. These tests are a tradeoff between how good the model fits and how many degrees of freedom (i.e. number of observations less number of parameters) remain.

However, i was wondering what is the theoretical validation with wanting high degrees of freedom (or conversely wanting a low number of parameters)?



Cookie Scientist
The principle is that we want our model to provide an accurate, but parsimonious description of the data. As the number of parameters in the model approaches the number of data points, the model will be better able to accurately fit any arbitrarily complex dataset, but the tradeoff is that the model is also less and less parsimonious. In the limit where there are just as many parameter as data points, all the model has really done is provide a verbatim redescription of the dataset, so that it's really not clear if we've learned anything. In practical terms, when the ratio of parameters to data points becomes too high, the generalization error of the model (i.e., the ability of the model to predict data points not found in the original data set from which parameters were estimated) suffers.


Fortran must die
While adding more variables does eat up degrees of freedom, parsimoney is not really tied to that directly. It is based on the conceptual argument that as you add variables understanding what you get from the data becomes ever more difficult. Therefore you always want to have the fewest variables in your model that adequately explains it.
Thanks Jake,

So would I be correct in saying that:

We are interested in maximising Degrees of Freedom because this minimises the 'Generalization error' and 'Type 1 error' for the model. This is due to more parsimonious models having a smaller standard error, and so being more likely to predict for out of sample data.
Hi Noetsi,

I'm a bit confused by your use of the word 'adequate'. We need to find the best/most predictive models, and we always are forced to choose between better fitting models with more parameters and weaking fitting ones with less parameters. In such, we need to justify why we are rejecting these better fitting models, as you can virtually always improve the fit by adding another parameter.


Ambassador to the humans
We need to find the best/most predictive models,
Note that adding parameters doesn't necessarily improve the predictive power of the model. It will always give you a better fit to the data you have at hand but whether it helps you predict out of sample data is something that needs to be considered as well.


Fortran must die
There are two ways to look at this issue, the justification for parsimoney (I actually used the word I had seen used). One is that there is a trade off between increased explanation of variance (with more a "better" model) and complexity. In theory you could add every variable that actually contributed to a phenomenon to a model. And you would explain all the variance - so you would have the "best" model. But you would end up with a model with virtually no analytical usefulness - because it was too complex to actually be used given the large number of variables in the model. In science sometimes simplicity is better even if you lose some explanatory value as a result.

Another approach to parsimoney, and I think this is the one you are using, is that if two models have approximately the same predictive power than you chose the one with fewer variables. Commonly you do a chi square difference test between a more complex model (which has more variables) and a less complex one. Only if the result of the test is statistically signficant do you chose the more complex model.