Compute prediction intervals in Random Forest Regression


I am trying to use Random Forest Regression to better predict the output of a manufacturing process.

Until now, I was using PLS regression to do so but it turned out that Random Forest Regression gave better results (lower RMSE on the test dataset). My main concern
is that I am not able to give a prediction interval in addition to the prediction given as it is usually done with classical linear regression.

Is there a way (gold standard or not) to compute prediction intervals on random forest regression? It is of main interest as I need to quantify the risk for my next output to be out of specifications limit (which is pure waste and costs).

Thanks for your help.



TS Contributor
What is the problem that you are trying to solve? Beyond prediction, that is. What is unique about this process that you wouldn't use SPC to control the process?


Omega Contributor
Yes, you are rubbing against the key issue with many machine learning algorithms when you want precision estimates or to do inferential statistics. Most algorithms give you fantastic predictions, but the CIs are not well established in the literature. I don't know of a method off the top of my head. If you can find links to one, I would be happy to review and explore it with you. Traditionally, if you can't get precision estimates you default to bootstrapping, but I would imagine you are then bootstrapping in the RF to get bagging then doing it again on multiple other samples - which just seems like the same thing.

If you cant find a solution, I am not sure RF is right fit for the problem. I am not sure how big of data you have, but perhaps you could get some type of cross-validated standard errors for predictions.

Tell us more about your PLS approach - I have not used it (I am guessing you mean partial least squares). Also, are you doing RF and PLS in SAS as well?