Strange results of logistic regression

#1
Hi everyone,
I have strange results of logistic regression:
The model is following:
dependent: success of treatment, 0 - No, 1 - Yes,
possible predictors: treatment, 1 or 2 (numeric) and sex with 2 different codes, first: (F=0, M=1), second: (F=1, M=0).

If I build regression with treatment alone, then coefficient for treatment is significant (p=0.03).
If I build regression with treatment and sex (both types of codes) without interaction, then coefficient for sex is not significant, coefficient for treatment is significant (p=0.02).

If I build regression with treatment and sex (FIRST type of codes) with interaction, then coefficient for sex is not significant, coefficient for treatment is NOT significant (p=0.38), interaction coefficient is not significant.

If I build regression with treatment and sex (SECOND type of codes) with interaction, then coefficient for sex is not significant, coefficient for treatment is significant (p=0.03), interaction coefficient is not significant.

I don't understand, why significance of treatment depends on which type of codes I use for sex.
Did anyone have to deal with such a situation?
Many thanks in advance!
 

noetsi

Fortran must die
#2
There is no reason how you code a dummy variable should have any impact on anything except the sign of the effect. I would check for a coding mistake or maybe a mistake in the data you are using.

If you have interaction then some question the value of interpreting main effects at all. At the least you have to interpret the effect of a main effect at one specific level (only) of the interacting variable. Not at all levels as you have apparently.
 

hlsmith

Not a robit
#4
Results should be comparable. Post code and output for your last two listed models and we will help unpack the results. This is the easiest way for us to see what you are doing and writing about.

Thanks!
 
#5
There is no reason how you code a dummy variable should have any impact on anything except the sign of the effect. I would check for a coding mistake or maybe a mistake in the data you are using.

If you have interaction then some question the value of interpreting main effects at all. At the least you have to interpret the effect of a main effect at one specific level (only) of the interacting variable. Not at all levels as you have apparently.
I agree that coding should not influence to results, because of it I posted this question. I put codes in 10 hours, and I don't understand what be be mistake in data. Why I should interpret the treatment effect only for one value of sex?
 
#8
I use SAS University Edition, the code is below:

***********************************************************************************

DATA t2_work;
SET t2_source;
FORMAT Response 1.0;
FORMAT Gender_num_F0M1_ 1.0;
FORMAT Gender_num_F1M0_ 1.0;
IF (responseCategory="PR" OR responseCategory="CR") THEN Response=1; ELSE Response=0;
IF (gender="FEMALE") THEN Gender_num_F0M1_=0; ELSE IF (gender="MALE") THEN Gender_num_F0M1_=1;
IF (gender="FEMALE") THEN Gender_num_F1M0_=1; ELSE IF (gender="MALE") THEN Gender_num_F1M0_=0;

PROC FORMAT;
VALUE Response 0='Non-responder' 1='Responder';
VALUE Gender_num_F0M1_ 0='FEMALE' 1='MALE';
VALUE Gender_num_F1M0_ 0='MALE' 1='FEMALE';
RUN;

/* log regressions */

proc logistic data=t2_work;
model Response (EVENT='1') = Gender_num_F0M1_ TRTPN Gender_num_F0M1_*TRTPN;
run;

proc logistic data=t2_work;
model Response (EVENT='1') = Gender_num_F1M0_ TRTPN Gender_num_F1M0_*TRTPN;
run;

***********************************************************************************


results are in the attached file.
 

Attachments

hlsmith

Not a robit
#9
Well, when dealing with interactions terms it is standard practice to ignore base terms in the model since they are conditional on each other and don't have an independent interpretation. Thus, you shouldn't care about the base terms. We can see the interaction term did not change, which is what we would care about.

The change in the base term is because the base case (reference value) had been switched. So given this the intercept and the TRTPN change since now it is the log odds increase for the other group (base case) and base prevalence (intercept). This may all seem strange at first, but the interpretations are just different for those terms.
 

noetsi

Fortran must die
#10
Incidentally to somewhat disagree with my learned colleague hlsmith:) it is not true that all do not interpret the main effect when interaction is present. It is not uncommon to interpret the main effect at specific levels of the interacting variables (a different form of interpretation admittedly). This is called simple effects by some.
 

noetsi

Fortran must die
#12
I am not sure how you do this in logistic regression. In linear regression you tell the software to estimate the impact of X1 on Y at some specific level of X2 when X1 and X2 are interacting predictors.

I would have to go back and look at my SAS code to see how you do this.
 
#13
May be, it would be better to set these predictors as text variables, as we can't order values of each of them? I tried and I received quite consistent results. By default SAS assigned binary text variables with values -1 and 1. I did it using coding: treatment values 1 and 2 as 1 and -1, gender values as FEMALE 1 MALE -1 and conversely: treatment values 1 and 2 as -1 and 1, gender values as FEMALE -1 MALE 1. Then I built the models for all combinations, and all results are the same.
 

Attachments

noetsi

Fortran must die
#14
"By default SAS assigned binary text variables with values -1 and 1."
From memory that is effect coding. Personally I would use reference coding where the values are normally 0 or 1. This is the most common way analysis is done.
 

noetsi

Fortran must die
#16
Your results make no sense to me, how you code something has nothing to do with its significance. I can only assume an error in the coding or data or possibly power issues.