Apologies if this has been asked before, but I could not find anything relevant.
I have created a physiological score (continuous variable, values between 0 and 1) that correlates to a disease (binary variable, 0 = healthy, 1 = patient). The idea is to use this score to predict the disease (since it is measured relatively easily).
The dataset that I have available, however, is unbalanced with regards to age (healthy people are in average younger than patients). We also know that age plays some role in this disease (indirectly).
I am trying to detect if this bias in age renders my analysis problematic. In particular, I need to be sure that the good correlation between the score and the disease is not due to an imbalance in age between the two groups.
For this reason, I run two independent GLMs (binary, with logit link):
Disease ~ age
Disease ~ score + age
According to the results, adding the score variable largely reduced the residuals (72 from 110 for the model with age only). Furthermore, both variables have coefficients that are statistically significant.
Is this enough to show that, even correcting for age, the physiological score has some explanatory power for the disease?
I have created a physiological score (continuous variable, values between 0 and 1) that correlates to a disease (binary variable, 0 = healthy, 1 = patient). The idea is to use this score to predict the disease (since it is measured relatively easily).
The dataset that I have available, however, is unbalanced with regards to age (healthy people are in average younger than patients). We also know that age plays some role in this disease (indirectly).
I am trying to detect if this bias in age renders my analysis problematic. In particular, I need to be sure that the good correlation between the score and the disease is not due to an imbalance in age between the two groups.
For this reason, I run two independent GLMs (binary, with logit link):
Disease ~ age
Disease ~ score + age
According to the results, adding the score variable largely reduced the residuals (72 from 110 for the model with age only). Furthermore, both variables have coefficients that are statistically significant.
Is this enough to show that, even correcting for age, the physiological score has some explanatory power for the disease?