I'm trying to perform logistic regression to determine risk factors for retinopathy of prematurity (ROP). The outcome is binary 1= required treatment 0 = did not require treatment.

Two important continuous input variables are gestational age and birthweight.

The more premature you are (lower gestational age) the more likely you are to get ROP.

The less your birthweight, the more likely to get ROP.

But the more premature you are, the more likely you are to have a low birthweight.

Does the logistic regression in SPSS correct for this or are my correlation coefficients going to be spurious?

I'd also like to add in variables such as sex and ethnicity but birthweight would also be dependent on these.

The only way to correct birthweight for age and sex is to calculate the child's centile. Thus a child on the 10th centile is at the top of the lowest 10% for their age, a child on the 50th centile is average and the 90th centile is at the bottom of the top 10% for their age. The problem with this is that centile and gestational age does not seem to give as strong or good a prediction as simply using birthweight and gestational age (the standard way of doing things).

In the end, what I want to do is work out the probability of requiring treatment given different combinations of the predictor variables.

Any help would be greatly appreciated!

Best Wishes

Simon