Measurement error in binary outcome variable for a case-control study

I am currently working on a case-control study on a rare disease using a large patient database. Using conditional logistic regression I will calculate odds ratios for different exposures in cases and controls. The validity of the records for the disease of interest (outcome) is unknown.

I have found around 550 potential cases with a record for the outcome of interest. Two specialists are going to divide those 550 patients in probable and unlikely cases based on the information they have. For 100 of those 550 potential cases, I additionally have discharge letters (which I did not give to the specialists) which confirm/refute the diagnosis. With those discharge letters I am going to be able to calculate the specificity and sensitivity of the classification made by the specialists.

However from what I know I can assume that the specificity and sensitivity will be fairly high but not at 100%, meaning that my results will be biased by misclassified cases. I have been looking for literature which addresses the issue of outcome misclassification and I have found a paper about a procedure used in cohort studies (Madger and Hughes, 1997, Logistic regression when the outcome is measured with uncertainty). This technic is however not applicable in case-control studies. The mentioned paper is in the attachment.

How can I address this issue? Is there a possibility to adjust for outcome misclassification bias in a case-control study if the sensitivity and specificity for the outcome variable is known?

Thank you very much for your help!
Last edited:


Less is more. Stay pure. Stay poor.
"biased by misclassified cases" please describe what exactly you mean here!

Are your predictors something other than the specialists conclusions. So are you just using the specialist conclusions as a validation for the outcome, or are these conclusions what you are testing.

"Is there a possibility to adjust for outcome misclassification bias in a case-control study if the sensitivity and specificity for the outcome variable is known?" See my last comment, because your sentence is confusing.

Recommendation, Look through or search this topic in the American Journal of Epidemiology. They have had many articles on this topic. What you will most likely need to do is Sensitivity Analysis. This is not the same thing as the Sensitivity you described. It is how much assumptions can be off or misclassification can exist and you still have robust enough results to support your hypothesis. So how sensitive your results are to issues.
Thank you for your reply. I apologise if I have not made myself clear, but I started working in this field just recently so I may have confused some expressions.

We have some predictors for our outcome in form of symptoms or treatments recorded in the patient’s history around a record for the disease of interest. The specialists will classify the patients based on that information.

The potential exposures which cause an onset of the disease are well known and are recorded very well in the database. However little is known about the relative risks that come with each one of those exposures. My problem is that my study will most likely include some cases which actually suffer from a different disease with a similar clinical presentation but different causes. My odds ratios will consequently be biased towards 1.00.

I therefore wanted to know if I can account for this if I know that say 10% of my cases are actually no true cases. I was also thinking that I could divide my cases into groups based on the level of certainty I have.
E.g.: Patients with at least 3 symptoms were classified with sensitivity and specificity of X%/Y%. Patients with 1 or 2 symptoms with Z%/W%. etc.

I know I could use this for a sensitivity analysis but I am not sure if I am still going to have enough cases in each category. I am no statistician but I was thinking that it might be possible to handle this in another way in the analysis.

I will follow your advice and search this topic in the American Journal of Epidemiology. Thank you!
Last edited:


Less is more. Stay pure. Stay poor.
It sounds like you have some good ideas toward the problem. I have not dealt with this problem in particular, but as referenced, I bet you can find a good article in AJE. Yes, as you mention, if the miss classification is likely randomish, it will nullify your results.

So your controls, will not have the outcome and the cases will have some individuals missclassified as cases, correct.

Perhaps similar to what you wrote you can create a treshold on the probability and then conduct polynomial logistic regression (controls, possible missclassified, possible cases). I would also try to draw a causal pathway figure to illustrate your problem. Maybe when you draw it, a collider can be recognized or place with a confounder influence. Given this, literature on unknown confounders may also help you tackle your problem.