two continuous & two categorical variables: how to compare

Dear all,

I have the following biological data:

  • 1st variable (continuous, normally distributed): concentration of a hormone in blood
  • 2nd variable (continuous): age of donors
  • 3rd variable (categorical): genotype (control vs mutation-carrier)
  • 4th variable (categorical): gender (male vs female)

I want to test whether there is a difference in the hormone levels between control donors and mutation carriers, but age and gender might bias the direct comparison using a t-test. For instance, I see that age affects the hormone levels in the control group, but not in the mutation group.

What sort of normalization/correction test shall I apply prior to comparing horomonal levels between the control and mutation groups?

Thanks a lot for helping!

Best wishes,

University of Luebeck, Germany


TS Contributor
I think the best would be to simply build a regression model with one continuous and two discrete variables with interactions amongst them. That will give you all the detailed infos you need.
Thanks for your suggestion, rogojel!

What I've learned so far is that I can use the analysis of covariance (age as a covariant) with two categorical variables. However, it assumes that my continuous data can be modelled with a linear regression. But, as fas as I understood, I should first test whether linear regression is applicable for my data by analyzing residual plots. And I should test for potential outliers (measuring Mahalanobis’ distance or Cook’s D).
Am I thinking in the right direction?
Is there anything I'm still missing here?

Thank you!


TS Contributor
that makes sense. In order to have residuals you should first build the regression model - same for Cooks distance.