Which variable for Cox regression?

#1
Hi guys,

I have a database of patients with their long-term survival data. Amongst others, I have one variable that resembles a scoring system in order to classify patients according to the severity of the disease (ordinal variable, 4 classes). This classification system was recently revised, and I have a second variable for the revised version. Now I'd like to investigate whether either score is independently associated with long-term survival. For this, I would like to run a multivariable Cox regression including further baseline variables as covariates. My question is: does it make sense to include BOTH scoring variables in the multivariable analysis? Or should I run two seperate multivariable analyses, one containing the old score and the other one containing the revised score as a covariate?

I hope I expressed myself clear enough. Your help is greatly appreciated! Thanks guys!
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
So you have two versions of the same variable (that will like get treated as a categorical variable, ref lowest or highest group). Do both versions have the same number of groups? What is the explicit question you want to answer?
 
#3
Hey hlsmith, so it's as follows:
one variable ist coded as 0, 1, 2, 3 (with 3 being the most severe form of the disease, so 0 as reference).
then I have the revised definition of the variable, which is coded as 0, 1, 2, 3, 4.

My question is to compare both definitions in terms of their ability to predict long-term survival. My idea was to run a multivariable Cox regression with both scores as a covariate (amongst a handful of additional baseline covariates). But I am not sure if this makes sense.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Well you will have a different number of degrees of freedom between the models. I am trying to think if you can do a nested model comparison between the models (-2log likelihood test and a calibration curve comparison). Such approaches would have you fit the model twice. Each differing by which of the two variables you included.

How much data do you have? Another option may be to do a data split. Fit both models to the training set and then scoring the heldout set and see which model performed better.
 
#5
I have aboud 430 observations. Data split sounds good. How about I conduct two multivariable Cox regression analyses, one containing the old score as a covariate, and another analyses which contains the new variable in the model and see, which one performs better?
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
OK, I am on an actual computer now. The random data splitting is traditionally used in modeling for comparing candidate models. I haven't done this in survival analyses before. I am not sure you can fully score the new set given it is time to event data. If you find out otherwise I would be interested in hearing about it. But I know you can some how develop calibration curve out of survival data, so some could be possible. Of note, when splitting data you can then come into sample size concerns if you have a rare outcome or an overparameterized model.

Traditional approaches to comparing models would be using the difference in the two models AIC values or doing a -2 log likelihood test.