What model to choose: Survival analysis?

luup

New Member
#1
Hi all,

I am currently writing my master thesis and don't really know what model to choose. Here are the details:
  • dependent variable = time in months to a certain event; the lower the better
  • overall 7 independent variables = 4 are binary (yes/no), 3 are trinary (a/b/c)
  • the data does not have the character of a time series but is a snapshot: I asked in a survey how long it took to get to the event (Y) and what were the 7 circumstances (X1 to 7)
  • sample size is 75
I read about survival analysis but I am not sure what to make out of it. Anyone has a suggestion how to proceed?

Thanks a lot in advance!
 

Karabiner

TS Contributor
#2
I asked in a survey how long it took to get to the event (Y) and what were the 7 circumstances (X1 to 7)
What precisely is your research question? Maybe it would be useful to explain the actual study,
i.e. what is the topic ot this research, who are the people studied, what do you mean by "certain
event", what are the 7 "circumstances"?

With kind regards

Karabiner
 

luup

New Member
#3
I study my colleagues - secondary data sources - who provide me with the data for their customers, both with respect to the dependent and the independent variables. Customers are companies that are deploying a software tool.

My goal is to understand the success factors that led to a rapid deployment. The dependent variable (in months) captures the time frame it took the company to reach a certain threshold: Y=5 if the company needed 5 months, Y=10 if the company needed 10 months. The lower the number, the better. I can't disclose the threshold and the exact topic here, given its sensitive data and I am bound by an NDA.

The independent variables center on the customer setup: executive ownership (yes/no), implementation (internal/external/mix), and so on. All of those variables are either binary (yes=1, no=2), or trinary (A=1, B=2, C=3).

Hope this makes sense?

Thanks
 

hlsmith

Not a robit
#4
NDA: notorious dudely astronaut or nondiclosure? Seems like your time may be discrete, so keep that in mind. Biggest question is what is the event rate, even if it is at 50%, so 37 events you definitely have sparsity issues - meaning risk of overfitting model if you use all of those covariates sucking up around ten degrees of freedom. Probably too many predictors.