Hello internet,
lets say I want to predict an outcome using multiple categorical predictors. Please consider the follow reproducible R-code:
As you can see, I wish to predict loneliness (no/yes) by what kind of pet (dog/cat) the subject has, by if the subject has a certain level of addiction (none/low/high) and by the subjects biological sex (male/female). So far, so good. I've dummy-coded each of these predictors, resulting in one variable for each predictor level to use them in my linear model. I know that one dummy-variable is usually omitted because it represent the baseline and expressed by the intercept. However, I find this problematic in the case of multiple categorical predictors since the intercept would then represent the intersection between one level of each categorical predictor (e.g. intercept = pet_dog + addiction_none + sex_male). I do not want my intercept to represent this interaction, in fact, I do not care about any intersect at all since i merely wish to evaluate the impact of each category on the outcome. How is this usually achieved?
Thanks a lot!
lets say I want to predict an outcome using multiple categorical predictors. Please consider the follow reproducible R-code:
Code:
library(dplyr)
library(psych)
set.seed(42)
n <- 500
df <- data.frame(
loneliness = round(runif(n, 0, 1)),
pet = factor(round(runif(n, 0, 1))),
addiction = factor(round(runif(n, 0, 2))),
sex = factor(round(runif(n, 0, 1)))
)
levels(df$pet) <- c('dog', 'cat')
levels(df$addiction) <- c('none', 'low', 'high')
levels(df$sex) <- c('male', 'female')
df_coded <- data.frame(
loneliness = df$loneliness,
psych::dummy.code(df$pet) %>% as_tibble() %>% setNames(paste0('pet_', names(.))),
psych::dummy.code(df$addiction) %>% as_tibble() %>% setNames(paste0('addiction_', names(.))),
psych::dummy.code(df$sex) %>% as_tibble() %>% setNames(paste0('sex_', names(.)))
)
model_formula <- as.formula(
paste0(
'loneliness ~ ',
paste0(colnames(df_coded)[-1], collapse = ' + ')
)
)
model_fit <- glm(formula = model_formula, data = df_coded, family = binomial)
print(summary(model_fit))
Thanks a lot!