Extreme Gradient Boosting for Survival Analysis?

#1
A bit of a theoretical question here. A colleague of mine created an XGBoost model to estimate the probability of a sale closing on a software deal. The deal is scored every day and updated when changes in the deal terms exist, etc. However the estimates are extremely unstable day to day. They have about a 5% standard deviation. Is this approach valid or should a semi-parametric model such as Cox proportional hazards be used because this is continuous in nature. Is there a way to utilize xgboost to estimate how likely a deal is to close within a month or quarter and score it daily?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I know people use some of these machine learning algorithms in survival analysis or use stacked ensembles. I haven't tried them for this. Mainly, since I was unsure if they were able to generate the quintessential survival plots, which are important to me. These algorithm do phenomenal jobs with prediction, but are normally blackboxes. I may still lean toward Cox regression due to its interpretability. I did not follow your last question. Are you asking if it can handle discrete time?
 
#3
I'm wondering if it can handle discrete time, but also is it appropriate for panel data of this nature. It seems that if you threw in results for outcomes that are measured at the end of a quarter, but score observations daily that it wouldn't account for the dependencies in how a sales opportunity evolves from observation to observation throughout time and it doesn't seem appropriate to score it daily when we trained on quarterly outcomes.