range of tuning parameters

#1
Hi All,

I have just started my advanced statistics study. And here is one thing I have not been able to understand for long time -- how can we pre-determine the range of testing tuning parameters in cross validation if we do not want R automatically select them? I saw some paper says using a log scale. I am not so clear about that. In that case, how to ensure the precision of the best one since our test is done at a very sparse scale.

Can someone tell about this? If the strategy is not universal, you can take ridge or lasso for examples. And would be better to have some links to literature.

many thanks
 

noetsi

Fortran must die
#2
You can't have the same topic on two boards. I thought it belonged here. If you want me to move it to the R board instead I will.
 
#4
I believe you will get better answers from https://kaggle.a.ssl.fastly.net/forums :) For what it is worth I first run broadly 1,10,100 etc and then again more narrowly 7:15 for example.
Thanks for your answer. Then question comes to to a right sample of tuning parameter to correctly calculate the standard deviation of MSE, presumably it is calculated from simple root mean square of the difference between the mean MSE.
 

Lazar

Phineas Packard
#5
I would strongly recommend using caret for tuning the MSE and spread are returned via the train() function.
 
#6
I would strongly recommend using caret for tuning the MSE and spread are returned via the train() function.
sorry I did get. do you mean some function in R? I am not familiar with the package. My purpose is to apply the 1se strategy suggested by R.Tibshirani. And this seems to require a goog estimate of se. My understanding is that for example I use 50 different tuning parameters in CV, the se is calcuated by simple standard deviation of MSE of these 50 samples. Is that correct? Then it may be better to sample the parameter with less skewness, and proper step.

Thanks again.
 

Lazar

Phineas Packard
#7
Yes but the packages glmet or caret (where the train function can be found) will do all this for you. No need to write your own script to do this. In addition, caret at least (and I think glmnet) incorporates parallel processing to get things done faster (and make no mistake CV of the sort you are suggesting takes forever on reasonable sized databases).