Another strange result from regression

#1
I have ran a logistic regression looking at predictors of weight loss (group 1 losing weight v group 2 not losing weight). my model has one entry (all variables in at once e.g. sex (categorised as first), age, number of other disorders, diabetes score and appetite, fatigue, e.g. inflammation makers like CRP). I have a model that has significant model (e.g. better than block 0) and has "good fit" (e.g. not sign) and explains 30-70% of the variance in the outcome (hope that makes sense). However none of my variables are significant. I then removed all clinical baseline measure sex and age etc and just looked at key medical variables thought to predict weight loss in this group e.g. appetite level, diabetes score, CRP, fatigue scores. now I have two sign predictors. My question is do I have to include or control for age and sex variables as data is already nonparametric, e.g. sample sizes are not same, not normality distributed so age, sex etc are irrelevant anyway....help?
 

noetsi

Fortran must die
#2
Its very difficult to get at "explained variance" in logistic regression the Pseudo R squared don't show you that. How are you determining explained variance in your model?

If you have a model that is significant, but no variables that are then Multicolinearity is a likely issue. I think if controls are thought to be theoretically significant then you should include them in your model. I don't understand "...My question is do I have to include or control for age and sex variables as data is already nonparametric, e.g. sample sizes are not same, not normality distributed so age, sex etc are irrelevant anyway"
whether a variable should be in the model has nothing to do with normality (I am not sure why sample size would change).
 

hlsmith

Not a robit
#3
@Highhopes!

At this point I like to step back and try to understand what you are working with. What is your sample size? What percentage of the sample has the outcome of interest? How many predictors are you examining and are and categorical with greater than 2 groups?

P.S., What does "Another" reference"? What was the other strange result?
 
#4
Thanks for replying... Another reference was cos previous thread had " strange" outcome and so did I - not an imaginative title - apologises. So ok - I decided to do a 'simple' multiple regression. I entered age first then all other covariates, as I know my two groups are significantly different on age, I found age and fatigue were significant. I ran it again with all covariates in one entry and found the same outcome. the model only changed from 92 to 94%. How do I explain that age is a control but it is a significant predictor/contributor of the model as well.
 

hlsmith

Not a robit
#5
Model only changed from 92 to 94, are you writing about the c-statistic (accuracy)? You can perform a log likelihood test of nested models to show the addition of a variable does not contribution to the model. If Age contributes the test will corroborate this.
 

noetsi

Fortran must die
#6
A statistical control does predict the dependent variable. That is why its in the model. If your control explains most of the variance not your other variables than it does - and that is worth noting. Things beyond policy makers control are commonly critical in the economic literature.
 
#7
thanks again for replying... I have completed a stepwise forward logistic regression based on likelihood ratio. I think the final model in the list = best model? If add or remove age to this it removes a significant variable. I don't want to include age to the model because I am looking at 5 distinct variables that are though to predict death in a particular illness. Age is not one of those variable. Is it acceptable I exclude controls like sex and age. Or do they have to be include. Is it enough to report the Odds ratio and Confidence intervals of the significant variables. Do I need to explain anything else from the output?