Question about Regression and relationship testing

#1
Hello,

I am working on a project that is trying to look a possible correlation between BMI and various outcomes in a patient group.

So the BMI is normally distributed, nominal data
but my outcome variable include length of hospital stay, number of complications, number procedures done, etc, that are not normally distributed.

I was wondering what is the best test to run to find a relationship between BMI and these outcomes

I believe most of my outcomes are nominal, a few categorical

I was thinking of doing some logisitical regressions, but that requires dichotomous depedent variables and I'm not sure how to make that fit my existing data. Would a ordered-log-reg work? but then my outcomes are not ordered data.

Thanks for all the help
 

trinker

ggplot2orBust
#2
artaxerxes said:
So the BMI is normally distributed, nominal data
but my outcome variable include length of hospital stay, number of complications, number procedures done, etc, that are not normally distributed.

I was wondering what is the best test to run to find a relationship between BMI and these outcomes
Regression would be a candidate (likely multivariate as you have multiple outcomes you're testing and you wouldn't want to increase family wise error). We don't make any assumptions about normality of the data. The assumption is about the error (residuals) distribution.

I was thinking of doing some logisitical regressions, but that requires dichotomous depedent variables and I'm not sure how to make that fit my existing data.
This is not accurate. Logistic regression can be used with outcomes that are not dichotomous (multinomial logistic)

I hope this helps in making a decision. :)
 
#3
thank you very much for the reply,

one question I still have is that if my dependent variable is a continuous variable, how do I go about performing log regression?

for example, I am trying to correlate BMI (independent) to ICU length of stay, would I not have to break up ICU LOS into categories before regressing? is there a way to directly regress the two sets of continuous variables assuming the residuals are not normally distributed?

and does it matter if count variable is used instead of continuous for the dependent variable?
 

Link

Ninja say what!?!
#4
hmmm...I hope you don't mind me asking. What are nominal variables? How are they different from categorical? hint

You might want to look closer at your BMI variable distribution. In my own experience, there has always been a long right sided tail (for those extremely obese).

If you have a continuous variable, how come you're interested in log regression?

You could easily just use OLS for regressing BMI on ICU length of stay (if both are continuous).
 

Karabiner

TS Contributor
#5
Perform linear regressions with your interval scaled outcomes.

Perform binary or multinomial logistic regressions with your
binary or categorical outcomes.

For event data such as complications, a Poisson regression might
be applicable.

Do not break contiuous measures into categories, at least without a
very good reason.

Also, please read carefully the answers you get: Trinker already stated
that not the data need to be normally distributed, but the predicition
errors (residuals). In addition, even normality of the residuals matters
to a lesser degree if sample size is large.

Variables which measure time (here: length of stay) often are markedly
skewed; if you have a small sample size and non-normal residuals, a data
transformation (e.g. logarithmic transformation) can possibly normalize
the residuals.

Kind regards

K.
 
#6
thanks for the replies

I just checked the residuals and they failed normality for several of the regression based on shapiro wilk test, at what point would you disregard normality of residuals? I have 1300 or so data points.

regarding BMI, I am studying a subset of morbidly obese patient that appears not to be skewed