I am not a statistician but interested in knowing how the following problem would be approached by those who are.

What would you do if you had to run a multiple regression with a very large(!) number of independent variables?

Suppose you had 200 variables. The number of possible models would be huge (2^200), would stepwise regression be even a possibility? Or a smart approach? A "best subset analysis" i.e., exhaustive search would be computationally infeasible. Are there any other methods that might be used? At what point would they become infeasible?

I've come across PCA which I think allows one to reduce the number of variables before(?) the regression - but I am interested in the case were you might be stuck with 200 variables.

I suppose best subset analysis becomes difficult beyond perhaps 15-20 variables?

Stepwise regression at ? variables?

Others at ?

Related to this problem, I also assume that standard/commercial software (minitab, SAS, SPPS, R, etc) have limitations on the number of variables that could be used for generalized least squares multiple regression. I was unable find this information. Searching the web for "best subset analysis" and the software yielded nothing, so if anyone has some knowledge about this they could share I would be grateful.

Any pointers, links, references would be welcome too. I hope the question as I asked makes sense/is clear, or at least the intention behind it.

Thank you.