Hi all - I'm a student and had a question about statistical analysis.

I'd like to compare post-treatment outcomes between two groups:

It has been suggested to me that I should code the drug such that Group A is '0' and group B is '1', and run regression models to evaluate the explanatory power of drug choice on the various outcomes (linear for continuous dependent variables, logistic for binary dependent variables).

My understanding so far is that there are 2 advantages of regression: controlling for confounders, and seeing associations of other variables.

However, I also read that there are 5 assumptions for linear regression models. When I began testing my models for these, I found that my models don't have normal residuals (and so homoscedasticity also becomes hard to test). I don't have the time nor skill to transform / manipulate my data. The other issue is that for some outcomes, there are far fewer data points i.e. n = ~40.

Would be grateful for some help! I think some versions of this question have been asked before, but I couldn't find a conclusive answer.

I'd like to compare post-treatment outcomes between two groups:

**group A**who received a traditional drug (n = 200), and**group B**who received a newer drug (n = 100). Some post-treatment outcomes are continuous, whereas others are binary (e.g. % developing organ failure, mortality). The question is: which drug is superior? Group allocation was**non-random**, i.e. based on doctors' choices/clinical factors - more of an observational study than an RCT.It has been suggested to me that I should code the drug such that Group A is '0' and group B is '1', and run regression models to evaluate the explanatory power of drug choice on the various outcomes (linear for continuous dependent variables, logistic for binary dependent variables).

My understanding so far is that there are 2 advantages of regression: controlling for confounders, and seeing associations of other variables.

However, I also read that there are 5 assumptions for linear regression models. When I began testing my models for these, I found that my models don't have normal residuals (and so homoscedasticity also becomes hard to test). I don't have the time nor skill to transform / manipulate my data. The other issue is that for some outcomes, there are far fewer data points i.e. n = ~40.

**My question is: would a t-test/Mann-Whitney U (depending on normality), comparing mean values for the continuous outcomes between Group A and Group B be okay in this scenario, instead of regression?**I've done demographic comparisons and only age differs between the groups; can I report the t-test/MWU P-values with the caveat that any significant results could also be due to differences in age? And for binary outcomes like % organ failure, I could just use 2-tailed tests of proportion?Would be grateful for some help! I think some versions of this question have been asked before, but I couldn't find a conclusive answer.

Last edited: