Cox model or not ?

#1
Hello everyone,


I need your help please. I have a big data set with 2000 observations (machines), and some variables like the name, the start date, the age and some characteristics.
These machines have a short lifespan and I have to find why. Which variables have an effect on the lifespan ?

To do this, I need to focus only on machines that have lived longer than 1200 days to find the cause of death.
I had thought of using Cox in this case, but all the machines that have lived more than 1200 days are 'dead', none are alive. For censored data I need to have machines still alive at the end of the study. The ones that are still working are currently between 20d and 600d.
Can I use them for my Cox model knowing that all the machines older than 1200 days are dead?


Thank you in advance for your help
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I havent had this scenario in practice, but in theory everyone can have the event - since everyone dies. I think you can use all of your data, the model will help discern the hazards (rates) per predictor.

@Miner - any suggestions?
 
#3
I havent had this scenario in practice, but in theory everyone can have the event - since everyone dies. I think you can use all of your data, the model will help discern the hazards (rates) per predictor.

@Miner - any suggestions?




Thanks for you response @hlsmith
I was asked to focus only on the longest lived machines (over 1200 days).
The start-up year of my machines are not the same, so they do not enter the study at the same time, in the same year.
As my dead machines are over 1200days and the ones still alive are under 600days, can I still use Cox model? Despite this age difference between the living and dead machines?
 
#4
If you are just using dead machines without censored lifetimes I can't see much advantage in Cox or survival at all. Can you not look at determinants of longer life using correlation, Mann-Whitney or linear regression.
But ... do you not have machines who failed pre 1200 days ? Surely you'd want to compare charactersitcs between those earlier failers and those who survive longer than 1200 days. And if there are censored machnies who don't fail before 1200 days then they'd need to be included and then survival is needed. I don't understand why you are advised just to look at those who survive over 1200 days before failing)
 
#5
If you are just using dead machines without censored lifetimes I can't see much advantage in Cox or survival at all. Can you not look at determinants of longer life using correlation, Mann-Whitney or linear regression.
But ... do you not have machines who failed pre 1200 days ? Surely you'd want to compare charactersitcs between those earlier failers and those who survive longer than 1200 days. And if there are censored machnies who don't fail before 1200 days then they'd need to be included and then survival is needed. I don't understand why you are advised just to look at those who survive over 1200 days before failing)

Thank you for your reply @statlimp
I was asked to look only at machines that died more than 1200 days ago to try to find the reason for their death because when they die before 1200 days, generally the death is caused by humans.
So I took the dead machines that lived more than 1200 days and as censored data I used the machines still alive that are between 20 and 600 days old.
What do you think?

The explanatory variables are correlated with each other.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Yeah, your scenario seems a little off. If you just use those that made it to 1200 days - there isn't some type of survival bia???

So machines can be taken offline for other independent reasons? So there is a compete event??
 
#7
Yeah, your scenario seems a little off. If you just use those that made it to 1200 days - there isn't some type of survival bia???

So machines can be taken offline for other independent reasons? So there is a compete event??
I thought the same thing, that there might be different reasons for death and therefore different events, but they told me that if the machines are unplugged by the human, it's because there is no longer any chance of survival and instead of letting the machine shut down by itself, they unplug it themselves.

That's why I wanted to know if Cox's model is useful in my case (because there is a big difference in age between my living and dead machines).
 
#8
So machines either die after 1200d or are alive but with followup < 600d or are unplugged <600d.
There are no natural deaths <1200d ? you said "generally" ??
Is this right ?
I'd consider all times <600d as censored.
But are you just looking at predictors of longevity for machine deaths >1200d and there are no censored >1200d ? I still think non survival methods might be best ignoring the censored <1200d if machines never naturally die that young anyway.
 

hlsmith

Less is more. Stay pure. Stay poor.
#9
Might not hurt to just fit these data quickly with a Kaplan Meier to visualize the phenomenon and educate us. Visuals always help and can discern if cross-sectional methods, such as those proposed by @statlimp are appropriate. Can you also present some aggregated counts for all these scenarios for us to contemplate?
 
#10
Hello,


I may have expressed myself wrongly, I have to focus only on machines that have lived more than 1200 days, those that have lived less than 1200 days are not of interest to us. Moreover, the machines that are currently alive are between 20 and 600 days old. My machines are in different companies. The oldest machines recorded in the study date from 1990, and the most recent from 2020.
The newer machines have some features that the earlier machines did not have.
Also, my variables are sometimes highly correlated with each other.
So I was asked to look at the tanks that have lived longer than 1200 days and find out why they died. So I thought that the Cox model would be the most appropriate to find a solution.
But I had a question: for the cox model I need to build a survival object (working and dead machines) as all my machines over 1200d are all dead, can I add the living machines under 1200d as censored data or not? Since there is a big difference between the age of death and the age of the machines still alive.
 
#11
Might not hurt to just fit these data quickly with a Kaplan Meier to visualize the phenomenon and educate us. Visuals always help and can discern if cross-sectional methods, such as those proposed by @statlimp are appropriate. Can you also present some aggregated counts for all these scenarios for us to contemplate?


I don't understand, what value do you want?
 
#13
Hello,
There has been a change in my database and now it looks a bit better.

If I focus on my machines that have lived more than 1200 days, I have 1000 dead machines and only 5 still alive.
Can we work with the Cox model for 1000 dead machines and 5 still alive?

I can't attach the images to my graphics... is there another way?

thanks