Survival Model

#1
Hello everyone,

I am asking you my problem:
I have a database that contains 1500 machines that are located in 8 different companies. The goal is to know the cause of death of the machines that have lived the longest (I have about 10 explanatory variables). To do this, I focus on the machines that have lived more than 1500 days. Among them, there are 900 that died at more than 1500days and 1 that is still alive and has more than 1500days.
I wanted to use Cox in my problem, which would allow me to have, for each variable in the model, the risk of death.
Except that here I only have one machine still alive at 1500days and 900 are dead. So I wouldn't have enough censored data in my model. Is this a problem?
Is there any other method that could be used and that would be more appropriate here to answer my problem?

I would like to stress something:
Machines that died before their 1500 days are machines that died due to an external factor. I only deal with machines that have lived well (more than 1500 days) to find out why they died, what variables had an impact on their death, so that in the future I can build machines that live even longer.

If someone can help me to find a correct method to use in this case.

Thanks.
 

Karabiner

TS Contributor
#2
I am not sure what should be the dependent variable here. All (except 1) "subjects"
are in the same outcome group. If you want to know what is associated with an outcome,
you need variability in the outcome. And seemingly you do not want to predict length
of survival
, but just death yes /no. For that, you need non-dead machines. So you'd need some
shorter length of observation, so that a proportion of the machines is still alive at the end
of observation. Then you can analyse "factors associated with dead versus alive after [say]
500 days beyond day 1500". But I do not know whether this is compatible with your research
question(s).

What are these 10 explanatory variables?
 
#4
I am not sure what should be the dependent variable here. All (except 1) "subjects"
are in the same outcome group. If you want to know what is associated with an outcome,
you need variability in the outcome. And seemingly you do not want to predict length
of survival
, but just death yes /no. For that, you need non-dead machines. So you'd need some
shorter length of observation, so that a proportion of the machines is still alive at the end
of observation. Then you can analyse "factors associated with dead versus alive after [say]
500 days beyond day 1500". But I do not know whether this is compatible with your research
question(s).

What are these 10 explanatory variables?
thanks for your return !


I know the status of all my machines, whether they are still running (alive) or dead.
I would like to know the cause of death of machines that are older than 1500 days but in my database, machines older than 1500 days are no longer alive. So I don't yet know the variables that affected the death of the machines. That's what I need to find out.
The explanatory variables I have are (temperature, electrical resistance, length and width of the internal bar of the machine, ...)

When I focus only on machines that are older than 1500 days, I only have machines that are dead and none that are older than 1500 days and would be running.

I had thought of a Cox model, taking machines over 1500d as dead and living machines of any age as censored data. What do you think of this? Could this skew the results?
Because it should also be pointed out that the living machines are the most recent. And over time, machines from the 2000s are different from recent machines. Temperatures, bar lengths and electrical resistance are not the same between a recent machine and an old one for example.
 
#5
I have a database that contains 1500 machines
I only have one machine still alive at 1500 days
I only deal with machines that have lived well (more than 1500 days)
Then you have exactly one data point.
The rest of the data has been excluded by you.
Unless you have some other definition for "I only deal with" that doesn't mean "excluded by you"
 
Last edited:

Karabiner

TS Contributor
#6
I know the status of all my machines, whether they are still running (alive) or dead.
You said that one was alive, all others were dead.
I would like to know the cause of death of machines that are older than 1500 days but in my database, machines older than 1500 days are no longer alive.
If you want to compare dead with alive, in order to find out which factors distinguish between death
and survival, then you have not enough dead machines.
I had thought of a Cox model, taking machines over 1500d as dead and living machines of any age as censored data. What do you think of this? Could this skew the results?
It will tell you which factors are associated with earlier death within the sample.
Since all machines are dead (except one, which you could leave out) you could just
do a linear regression with "time to death" as dependent variable.

With kind regards

Karabiner
 
Last edited:
#7
Then you have exactly one data point.
The rest of the data has been excluded by you.
Unless you have some other definition for "I only deal with" that doesn't mean "excluded by you"
In fact I would like to use a Cox model to answer my problem, and take the machines over 1500d in my model (I don't care about the rest).
But for the censored data, I don't see what I could take...
 
#8
You said that one was alive, all others were dead.

If you want to compare dead with alive, in order to find out which factors distinguish between death
and survival, then you have not enough dead machines.

It will tell you which factors are associated with earlier death within the sample.
Since all machines are dead (except one, which you could leave out) you could just
do a linear regression with "time to death" as dependent variable.

With kind regards

Karabiner
a linear regression with only age at death as the variable to be explained? So Cox would be useless here?

And I thought of a Cox model with censored data of vats that died before 1500d and those that are still working. Could this logic work?
 

Karabiner

TS Contributor
#9
"those that are still working" at the end of the total observation period would still be 1 machine,
even if you included those machines which died before day 1500. You can do Cox regression
if you want to know which factors are associated with earlier or later death, respectively. I
thought that you could maybe just use linear regression, because no machine died except 1
and you do not have to build a model for a sample with censored data.

With kind regards

Karabiner
 
#10
"those that are still working" at the end of the total observation period would still be 1 machine,
even if you included those machines which died before day 1500. You can do Cox regression
if you want to know which factors are associated with earlier or later death, respectively. I
thought that you could maybe just use linear regression, because no machine died except 1
and you do not have to build a model for a sample with censored data.

With kind regards

Karabiner
Ok, thank you very much for your reply.
So I will use a linear regression with a explanatory variable which is the lifetime and the dependent variables which are temperature, electrical resistance etc... ?

With this method, I just have the regression coefficients of my dependent variables. I would also like the hazard ratio for each variable, which I can't get from a linear regression...
 
Last edited:
#12
Ok, then just run the Cox regression...
Ok, I have a last question.
If my censored data are all below 1500, it's not a problem for the Cox model? Because my censored data will have a lifetime of less than 1500d and the uncensored data has a lifetime of more than 1500d.

Doesn't that problematic for Cox's model?
 

Karabiner

TS Contributor
#13
I do not quite know what you mean. Your last description was that you include all machines,
and predict their lifetime with your 10 predictor variables. Do you now mean that you included
only machines with lifetime at least 1500 days, and predict the lifetime for that subsample?
In that case, I do not have an idea why should that be a problem for the Cox regression. The
regression does not know that the sample with lifetime at least > 1500 is only a subsample.

This is your third or fourth thread on this, and I get the feeling that the very same things are
discussed again and again and again. Maybe it's about time that you just perform your Cox
analysis and see what you get. Or, that you tell your supervisor that you need some real life
assistance.

With kind regards

Karabiner
 
#14
I do not quite know what you mean. Your last description was that you include all machines,
and predict their lifetime with your 10 predictor variables. Do you now mean that you included
only machines with lifetime at least 1500 days, and predict the lifetime for that subsample?
In that case, I do not have an idea why should that be a problem for the Cox regression. The
regression does not know that the sample with lifetime at least > 1500 is only a subsample.

This is your third or fourth thread on this, and I get the feeling that the very same things are
discussed again and again and again. Maybe it's about time that you just perform your Cox
analysis and see what you get. Or, that you tell your supervisor that you need some real life
assistance.

With kind regards

Karabiner
I tried a lot of models for Cox regression with different samples, and as I was not too happy with the results, I was trying to find out where my mistake was.
If I had assistance, any help here by my side, I would not have posted all these questions on this forum.
thanks for the help
 

Buckeye

Active Member
#15
IMO, it is difficult to help with a complex stats problem when we have limited knowledge of the research question and no visible results. But, I think each of us have done a good job probing the issues. You seem fixed on Cox regression as the sole way to solve the problem. Depending on how you frame the question may lead to a different approach. For example, you can predict "time to death" with a GLM for count data (or another gamma mixture if the dependent variable is continuous).