# Cox proportional hazard analysis - interpreting coefficients

#### marco_c88

##### New Member
Hi everyone, thank you very much already for taking time to read my question!

I have performed Cox proportional hazard analysis on some variables that were related to time-to-extinction of a certain population. The predictors varied between 0 and 100% but could only take on values that were multiple of 20 (i.e., 0, 20, 40, 60, 80 and 100). The estimated coefficient (exponentiated) for a certain independent variable was equal to 1.114587, but I am not sure how I should interpret this. Does it mean that each additional percentage point increase causes a decrease of 11.46% in the rate of survival, or that each 20% increase causes a decrease of 11.46%?

Thank you very much!

Marco

#### hlsmith

##### Omega Contributor
What program are you using and can you post the output (results) from the program exactly as they were generated? Did you call the predictor a continuous variable? It seems like ordinal categories. Can you also post a histogram of the variables distribution? It is likely if you enter it as a continuous variable its interpretation is a 1.22 times increase in the hazards ratio for a 1 unit increase in the variable, you would want to present the value with confidence intervals so others know the precision of the estimate.

#### ondansetron

##### TS Contributor
What program are you using and can you post the output (results) from the program exactly as they were generated? Did you call the predictor a continuous variable? It seems like ordinal categories. Can you also post a histogram of the variables distribution? It is likely if you enter it as a continuous variable its interpretation is a 1.22 times increase in the hazards ratio for a 1 unit increase in the variable, you would want to present the value with confidence intervals so others know the precision of the estimate.
In a way, I think these will still be on the ratio scale. They're ordered, differences of 20% (for example) are the same at any point, true zero exists, and 40% is twice 20% and 80% is twice 40%. Maybe the issue arises due to forcing limited responses, so measurement error is larger in the X-variables. Although, the loss of information may make this functionally more like an ordinal (maybe even interval) data measurement.

I think if you have only the data in absolute increments of 20% it is fair to interpret that way, but you can still say for every 1% increase in X as long as you scale beta by 1/20. This is no different than multiplying by a factor of 10 to turn a 1 dollar increase in x (for example) into for every $10 increase in x (assuming linearity). #### hlsmith ##### Omega Contributor Yeah, a couple of concerns I had were the linearity and then the covariates distribution. Is it appropriate to say a 2 unit increase follows a linear relationship true to the 1 unit increase? And also, if so, are all of the values cluster to certain values or are they dispersed? Ideally, you use the true underlying variable values as mentioned (loss of info). Can you tell use what the variable is, so we understand the potential limitations??? @ondansetron - I have not done such downscaling rescaling before. It seems plausible but perhaps at risk for greater misrepresentation of the phenomenon. So if I have units and I rescale the coefficient to 5 unit increments and there is a linear relationship - easy enough. If I try to downscale to 0.5 units perhaps you miss miniscule differences that were glazed over. Probably not a huge deal. But my mind just keeps thinking about the quintessential over fitting figures and trying to extend line fits and then go backwards from a fitted linear line to one with smaller segments. which would be fine but may be biased to try to fit a more finite phenomenon that was not modeled in the first place. You would be taking the left smoothed line and breaking it into smaller segments - but it would have the same shape, right? #### marco_c88 ##### New Member Hi ondansetron and hlsmith, thank you for your answers! Ok, let me clarify. The data that I'm using are auto-generated from agent-based simulations. The agent-based model simulate a population with different characteristics. Each combination of characteristics produces a different scenario. The population is split into a majority and a minority. What I want to study is the rate of survival (time to extinction) of the minority group based on different characteristics. For example, there is a variable that defines their level of "exogamy", i.e. how frequently minority people couple with majority people. My assumption is that offsprings of such couples are going to be considered majority individuals, therefore reducing the proportion of minority individuals in the whole population. The variable "exogamy" varies between 0 and 100% (that is, I artificially create scenarios in which people have an exogamy rate varying between 0 and 100%) in steps of 20 points (in other words, not scenario has an exogamy rate of, say, 10% or 31%). This clearly has an impact on the time to extinction of the minority. Larger exogamy rates are associated to quicker decline. Now, here's the question: if the estimated coefficients for this variable are 0.108483 (coef) and 1.114587(exp[coef]), how do I interpret them? Thanks! #### hlsmith ##### Omega Contributor I would enter the variable into the model as categorical variable then select one group as the reference group, either the 0 or 100%. Then you will get relative hazards in comparison to the reference for the 4 other groups, so 4 coefficients. If the outcome is extinction, then your coefficient would represent the relative increase hazards during the study period for that group to become extinct. E.g., exp(coef) = 1.40 for 20% vs. 0%, the 20% group has a 40% greater hazards (rate) of extinction than the 0% group. Does that help? If you are using survival regression I would also kick on the survival plot to better visually understand outcomes. #### marco_c88 ##### New Member I would enter the variable into the model as categorical variable then select one group as the reference group, either the 0 or 100%. Then you will get relative hazards in comparison to the reference for the 4 other groups, so 4 coefficients. If the outcome is extinction, then your coefficient would represent the relative increase hazards during the study period for that group to become extinct. E.g., exp(coef) = 1.40 for 20% vs. 0%, the 20% group has a 40% greater hazards (rate) of extinction than the 0% group. Does that help? If you are using survival regression I would also kick on the survival plot to better visually understand outcomes. Sounds very reasonable! Ok, I will try to do this and get back to you! Thank you very much! #### marco_c88 ##### New Member So, here's my output after switching from numeric to character variables (I assume it used 0 as the reference group): how to I interpret these numbers? #### marco_c88 ##### New Member In a way, I think these will still be on the ratio scale. They're ordered, differences of 20% (for example) are the same at any point, true zero exists, and 40% is twice 20% and 80% is twice 40%. Maybe the issue arises due to forcing limited responses, so measurement error is larger in the X-variables. Although, the loss of information may make this functionally more like an ordinal (maybe even interval) data measurement. I think if you have only the data in absolute increments of 20% it is fair to interpret that way, but you can still say for every 1% increase in X as long as you scale beta by 1/20. This is no different than multiplying by a factor of 10 to turn a 1 dollar increase in x (for example) into for every$10 increase in x (assuming linearity).
Let me get this straight: are you saying that I could either say that a 20% increase in X corresponds to an 11.46% increase in the risk of extinction or a 1% increase in x corresponds to a (11.46/20)% = 0.573% increase in the risk of extinction?

#### hlsmith

##### Omega Contributor
Well, since you have created 0% and 20% categories and we treated it like a group, we need to be a little careful with interpretations. If I had to articulate it I would say,

Based on agent based simulations, the population with 20% exogamy had an 11.4 (95% CI: 11.3, 14.7) times higher hazard rate for extinction relative to population with 0% exogamy.

Since you will now be making 4 comparisons 20 v 0,...,100 v 0; I would think about changing your 95% CIs to 99 CIs to safeguard against familywise error. Not probably terrible if you don't, but it may be good practice.

The /20 may be inappropriate might be more complicated than just dividing. I will think about it. But of note their is an increase as exogamy increases but I am unsure about the linearity. @ondansetron - chime in.

P.S., What program did you use to conduct the agent based modeling?

#### hlsmith

##### Omega Contributor
PS. I don't use PHReg much but there are model assumption (e.g., proportional hazards) that need to be tested.

Last edited:

#### hlsmith

##### Omega Contributor
The down scaling of the HR is likely: exp^(coef * 0.05)

For fun, you could repeat this for the different comparisons and see how close the estimates are. The above is for 20 versus 0. For 40 versus 0 it would be exp^(coef * 0.025)

Last edited:

#### hlsmith

##### Omega Contributor
1% increase based on each category:

20% 1.129
40% 1.122
60% 1.107
80% 1.109
100% 1.200

I wasn't expecting that to be as consistent, but perhaps it is an artifact of the data being synthetic.

#### marco_c88

##### New Member
1% increase based on each category:

20% 1.129
40% 1.122
60% 1.107
80% 1.109
100% 1.200

I wasn't expecting that to be as consistent, but perhaps it is an artifact of the data being synthetic.
How did you get these numbers? Anyway, I attach the description of variables and the output of cox regression of my first analysis, where I treated the predictors as numericals.

Can you help me interpret the results, please? Was my initial interpretation (1% increase in exogamy rate -> 11.46% increase in hazard) correct?

#### ondansetron

##### TS Contributor
Let me get this straight: are you saying that I could either say that a 20% increase in X corresponds to an 11.46% increase in the risk of extinction or a 1% increase in x corresponds to a (11.46/20)% = 0.573% increase in the risk of extinction?
Sorry- Hlsmith has the scale correct for this. I forgot the context and was thinking ordinary linear regression where, so he has the appropriate adjustment.

I think there is a question, though about whether the scaling like that is appropriate because it seems you created a variable with so few levels, although they seem to fit at least interval measurement if not ratio.

The other thing to note is that hazards and hazard ratios are not necessarily, and not often referring to risk. Risk and hazard are two different ideas.

Last edited:

#### hlsmith

##### Omega Contributor
A 1% increase in exogamy is associated with an expected 1.11 times greater relative hazard for extinction.

As @ondansetron mentioned, you would have to think about if it is appropriate to examine a 1 unit increase given the sparseness of simulated values. The above values I estimated would support a possible linear trend, eg.., (log(11.36), exp(2.43*0.05) = 1.129. I would also speculate this is related to how you simulated your data.

@ondansetron - I don't use hazards that much - just every couple of years. Would you have your own generic interpretation of them for me??