I have data from a frequency matched case-control study. All that needs to be known about that is the outcome variable is artificially balanced (50:50), when the true ratio is actually (10:90). When you run logistic regression on these data all outputs are correct (if generalized to the original unbalanced data or new unbalanced data) except the intercept and calculations based on the intercept, so probabilities. Side note, generated probabilities are in the right rank order, but need to be transformed to get the actual correct values for the overall unbalanced data. There are formulae to get these values. I will post links.

However, I want to score a new dataset based on this model, with the issue being the new dataset also has been artificially balanced (50:50). Question, has anyone else had this scenario, and if so - can I just score the new data and then apply the correction?

Model(Balanced training data) -> score(Balanced validation data) -> correct predictions (Balanced validation data)

Thanks!

However, I want to score a new dataset based on this model, with the issue being the new dataset also has been artificially balanced (50:50). Question, has anyone else had this scenario, and if so - can I just score the new data and then apply the correction?

Model(Balanced training data) -> score(Balanced validation data) -> correct predictions (Balanced validation data)

Thanks!

Last edited: