How to know which ML algorithm to choose

#1
I've been reading and practicing R and various machine learning algorithms for the past several months now. One thing I'm not so sure about still is, how do you know which algorithm to choose? I know about supervised/unsupervised, and classification/regression difference. However, once I get past that, how do I hone down on the best algorithm?

For example, if I have a classification, I know to not use regression, but I have choices of trees, vector machines, KNN, Naive bayes, etc etc. is it just a matter of running running them all on the training and validation set and pick the ones that is the most accurate? Or are there certain characteristics that will fit better with certain algorithms?

I googled a bit but could not find this question being answered.

Thanks
 

hlsmith

Not a robit
#2
Good question that seems to get neglected in the literature. It is my understanding that a typically approach is running them all on data using a cross-validation approach and then for classification problems, select the model with the best accuracy. I would imagine as your familiarity increases with the different models, you will become better able to align certain models with certain content. You can also throw into your mix regularization models probably.


The next level is stacking models, since they all have strengths and weaknesses and it is difficult to imagine they are all correctly specified. I planned to do the same thing you are this Summer if I get some time. Perhaps we can brainstorm together. Until then I am swamped by my work.
 
#3
Thanks for the reply. Would definitely welcome collaboration. Apparently, I must not be familiar enough because when I look at a problem, I still don't have a hunch at which algorithms I must include or which ones I I know I can exclude. For example, when I review the titanic reports on kaggle, I find myself asking: why did he choose trees?

Ah well, will continue on.
 

hlsmith

Not a robit
#4
What program are you using R or Python? Yeah, this Summer I hope to work through all of the basic models for ML. That way I at least know what they consist of and better understand their output/results. I have a basic overview understanding of most, but have not gotten anywhere near neural networks yet.
 
#5
Right now, I'm using R. Eventually python will be next, but small steps, I suppose. For me, it makes it so much easier to go through examples, and then afterwards, go back to understanding the algorithms