Comparing 2 similar regression models with and without continuous variables

#1
I have derived a statistical prediction model using backwards stepwise logistic regression in SPSS.

There are 5 independent variables which predict mortality, one of which is age (the other 4 are binary variables).

I have decided to dichotomise age so that the score can be easily used. I am aware that this runs the risk or losing prognostic information. Therefore, I want to compare the model as 5 variables with age dichotomised and with age as a continuous variable. The other 4 variables remain the same, as does the population and the outcome.

Having looked into this online, I am getting slightly conflicting information, which may simply reflect different opinions, however I cannot find what is the best figure to quote to in this situation.

I think it is probably best to present the -2 log likelihood figure for both, or I can use the chi squared likelihood. R square is another option, but I am aware that people sometimes find this confusing to interpret. I know that the scores are not really any different, as these 3 numbers are the same, and the associate ROC curve for each model is only 0.005 different.

Any help is appreciated.
 
#2
As suggested by the forum guidelines, I am bumping this as I have not had an answer. Please let me know if people have not posted as my description is inadequate and how I can make it clearer if needed. I appreciate any comments/ help, Thanks
 
#3
It is generally suggested to NOT dichotomize an explanatory variable. It will (in general) give biased and inconsistent estimates. To me it seems intuitively strange to think of a 45 year old as the same as an 85 year old, but different to a 39 year old, which is equal to a 10 year old.

Then it is better to insert the age variable as “continuous” regression variable, possibly with a squared term. A good possibility is to use a GAM-model (generalized additive model).

It is often suggested to not use a stepwise model. It is better that you think yourself and try different models. Stepwise will often give incorrect models. A possible alternative would be LASSO regression (search for it). I don't know it you can run GAM or LASSO in spss.
 

rogojel

TS Contributor
#4
Hi,
one quite accepted measure would be the AIC ( Akaike Information Criterion) the lower the better. I doubt that the dichotomous version will fare better.

regards
 
#5
Hi,
Thanks for your comments.
I think I need to read more about Generalised additive models, though my first impression is that this will add complexity to the score. The score needs to be easily calculated at a patients' bedside without the need for a computer or calculator, which is why I chose to dichotomise the fifth variable (the other four are binary). The imperfect score that is used is better than the perfect score which is never used. The simple score including the dichotomised variable is far stronger than clinical judgement which is essentially what we are competing with.

I will also do more reading about LASSO- thanks for the suggestion.

AIC is an interesting suggestion- will look into this, thanks.
 
#6
Hi,
From reading about AIC my understanding is that it is based on the likelihood function of the model, and the number of estimated parameters:
AIC = 2k - 2ln(L)

The number of parameters in each model is the same, so it really comes down the likelihood function.

As mentioned, there is almost no different in the -2log likelihood for each model:
Model 1 (age categorised) -2log likelihood= 950.356
Model 2 (age continuous) -2log likelihood= 950.820

Based on this, my inclination is to state the -2log likelihood.

Once again, I appreciated people's comments, thanks.