Appropriate logistic regression model to control for confounders?

I am working with a cross-sectional study sample (N=601). The goal of my research is to investigate if there is an association between a disease (X) and another disease (Y). I have used crosstabulation with chi-square testing to confirm that X and Y are associated in a univariate setting. I have also used crosstabulation and t testing to investigate if other clinical variables of interest are associated with either X or Y.

At this point I would like to investigate if disease X is independently associated with disease Y, even after controlling for some other clinical variables. My tutor recommended the use of logistic regression for this procedure.

My thinking to this point is that in order to control for other variables I have to force them into the logistic regression model. If I use forward or backward selection (or if I experiment with removing variables based on clinical judgement to try to find the best model) some variables will be excluded and because of this I can not claim to control for them in the final model. Can I control for a variable even if it is not in the final model? I have looked on the internet and in a couple of text books for two days straight to find an answer to this question.

Process for figuring out the variables I want to control for:
Decided which variables that could plausibly be associated with X or Y
Significance testing on these associations with crosstabs and t test
Picked a rough list of variables with p<0.150
Excluded some variables because of too many missing values compared to N
Excluded some variables because of suspected data collection errors
Excluded three variables that are part of the definition of Y
- I don't know if this is correct. Is it?
Variables with obvious collinearity were also grouped and the most representative one was picked and the others excluded to avoid multicollinearity.
----> The final list that I want to force into the logistic regression model together with disease X and age and sex
Anything seems fishy?
You are right about the "forcing" of the control variables. In this situation, you add your covariates to the model to determine if your independent variables are still significant.

A very common thing to do is run your model (or best model if your model building) once without the control variables, then run the model again with the control variables to see how the parameters change.

In terms of how to find these control variables, they should come from theory and should make sense. Otherwise, nobody will care that you took the time to control for them. Age and sex are common control variables. Also, income, education, race, diet, smoking, etc.

Hope that helps.