LASSO

#21
If you don't have a holdout set, you would not turn around and use LASSO model terms in an OLS model, you would use output from the LASSO model. If you had a holdout set, you then can use the variables selected from the LASSO in the OLS and the OLS estimates. You don't get your cake and eat it, when you don't have a holdout set.
 

noetsi

Fortran must die
#22
I have thousands of cases so I could create a hold out data set. Is there any particular rule how you chose what is and is not in the holdout data set (in time series its the most recent data). I know how you create a holdout data set in GLM SELECT, but not how you tell SAS what specific data (as compared to how much) you want to drop out.

LASSO reduces some terms to zero so I guess those variables drop out if you use the LASSO results.
 
#23
Yeah, if you don't have time series, a random sample is fine, say using 40% train and 60% holdout. It is that easy, and correct terms shrunk to zero get dropped.
 

noetsi

Fortran must die
#24
GLM SELECT or GLM does not have the various tests of regression assumptions that PROC REG has. Can you run GLM SELECT, chose a model, then test the regression assumptions with PROC REG for those variables?
 
#26
I use r for it. But i would guess it uses Breiman's 1 SE rule, or you should select it. Which finds the best penalized model that uses the most regularized comparable model with the fewest terms (regularized). Frank Harrel gave a talk last week where he said everyone uses the defaults in LASSO, which may not be the most prudent in all cases. He was making a plug for Baysian modeling, where you may have to give more consideration to content instead of hoping a model will just generate the best model for you.
 

noetsi

Fortran must die
#28
given that I know very little of baysian approaches (I struggle to learn frequentist methods) I probably won't do that. :p

thanks