The most appropriate method in this case

#1
Hello everyone,

I am asking you my problem:
I have a database that contains 1500 machines that are located in 8 different companies. The goal is to know the cause of death of the machines that have lived the longest (I have about 10 explanatory variables). To do this, I focus on the machines that have lived more than 1500 days. Among them, there are 900 that died at more than 1500days and 1 that is still alive and has more than 1500days.
I wanted to use Cox in my problem, which would allow me to have, for each variable in the model, the risk of death.
Except that here I only have one machine still alive at 1500days and 900 are dead. So I wouldn't have enough censored data in my model. Is this a problem?
Is there any other method that could be used and that would be more appropriate here to answer my problem?

Best regards,
Stat_member.
 

Miner

TS Contributor
#2
I think using Cox regression is an overkill solution. Your stated goal was to determine the cause of death for machines that have live more than 1500 days. A simple Pareto chart of the 900 will provide this information. The one surviving machine will not change your conclusions.

Just because you can, doesn't mean that you should.
 
#3
Hello,

Thank you for your feedback @Miner !
Generalized models or penalized regression cannot be used in this case too?
I was expecting one of the statistical methods already mentioned...
So they are simply diagrams.

I have a question about Cox regression. If I deal with tanks older than 1500 days, I have about 900 dead and one alive. And if I add to this sample the still living tanks that are less than 1500 days old (all for the sake of using Cox), the results might be distorted?

Thanks
 

Miner

TS Contributor
#4
Generalized models or penalized regression cannot be used in this case too?
I was expecting one of the statistical methods already mentioned...
Why? Your stated goal was to determine the cause of death for machines that have live more than 1500 days. Regression of any type will not provide this information.

I have a question about Cox regression. If I deal with tanks older than 1500 days, I have about 900 dead and one alive. And if I add to this sample the still living tanks that are less than 1500 days old (all for the sake of using Cox), the results might be distorted?
Always include all of your available data when performing a reliability analysis. Excluding the data less than 1500 days will distort your results.
 

Buckeye

Active Member
#6
Where in the statement of the problem did you find an application to PCA? lol.

To Miner's point, we can't determine causal relationships from a regression analysis. Unless we...
See also https://en.wikipedia.org/wiki/Design_of_experiments

I don't understand why you can't do logistic regression with alive or dead machines (all data) and use the age as an explanatory variable amongst others. This will tell you which variables increase or decrease the odds of death and by how much.
 
#7
Why? Your stated goal was to determine the cause of death for machines that have live more than 1500 days. Regression of any type will not provide this information.



Always include all of your available data when performing a reliability analysis. Excluding the data less than 1500 days will distort your results.
The problem is that for all the dead machines, I don't know the reason of death. That's what I need to find out, but I don't see how a Pareto chart would help me if the cause of death is not known for any machine. But I don't know if they died because the temperature was too high, or some other factor...
 
#8
Where in the statement of the problem did you find an application to PCA? lol.

To Miner's point, we can't determine causal relationships from a regression analysis. Unless we...
See also https://en.wikipedia.org/wiki/Design_of_experiments

I don't understand why you can't do logistic regression with alive or dead machines (all data) and use the age as an explanatory variable amongst others. This will tell you which variables increase or decrease the odds of death and by how much.
I was asked to focus only on machines that lived longer than 1500 days because those that died earlier died of an external cause.
I don't want to know the cause of early deaths but only machines that lived their time. It is estimated that a machine that lived at least 1400 days lived well, but those that died before died "young".
So I'm trying to find out what the cause of death is for machines that have lived more than 1500 days. I'm not interested in those with less than 1500 days. And I only have 900 machines that died after their 1500 days and only one machine still alive today that is over 1500 days old.