Variable restriction in (exhaustive) model selection

#1
Hi all,

I have a high number of variables (around 80) with which to model an intermediate-size sample (around 50 points) using GLMs.
I would like to do an exhaustive search for the "best" model, but using all of the variables in an exhaustive (or semi-exhaustive, like glmulti's genetic algorithm) does not seem an option.

Could I somehow perform an exhaustive search of all, say 8-variable models, without having to write a custom script for that? Does anybody know of a package that allows the restriction on the number of variables of a model?

Many thanks!
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
General comment - will the output be penalize to address false discovery related to the plethora of possible models explored? More over with such sparse data - your results will likely not generalize well outside of the sample even if cross-validation is used!
 
Last edited:
#4
Hi hlsmith,

I agree with your comment. At this stage, I am simply doing an exploratory search, to see which of my variables could (in theory) be used in subsequent analyses / experiments. Statistical robustness with regards to significance and extrapolation are not yet my concerns, as this is still very preliminary.

In other news, the maxsize parameter of glmulti() did not help too much, since the function still produces a warning and stops if the number of variables is more than ~33. If anybody knows a workaround, I would be grateful!
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
I feel like there is an approach that uses cross-validation, but is usually called best subset. Perhaps just searching "best subset" may be fruitful.