Sensitivity and specificity

Hi I have 2 test's A and B which are performed on patients and are done together to give a high level of sensitivity and specificity. However doing both test A and B is expensive.

I want to work out the sensitivity and specificity of A individually and B individually to see if there's a statistical difference in the sensitivity or specificity in comparison to A + B combined. I don't have another reference test or gold standard tho so would only be able to test the individual A or B against the A + B combined. What would be the best way to go about this.

Thanks in advance
Hi thanks for the reply. Hmm sorry, I'm not sure what you mean by the truth of the test? The A + B tests combined provided the highest level of sensitivity and specificity available and I plan to perform Test A then Test B then Test A + B combined on the same patient.


Less is more. Stay pure. Stay poor.
Yeah, I get that part. But you said you don't have a gold standard. How do yo judge the accuracy of Test A and/or B?
Hi, I see what you mean, so when I say there isn’t a gold standard that’s not strictly true, there is a gold standard but it’s only possible in autopsy so not available to me as part of a dissertation. Previous studies using autopsy’s sugest the specificity and sensitivity of test A+B combined is near 100%.
I therefore want to see the sensitivity and specificity of the A or B against the A and B combined which I’m having to assume is giving me the truth.
Last edited:
I assume you have historical patient data with tests A and B and the presence of the condition? Assuming the population is not biased in any way, you could just compare the spec/sense of A alone and B alone. You could use Youden's J Statsitic for informedness (Spec + Sense - 1) to see which test is better.


Less is more. Stay pure. Stay poor.
I think more clarification is needed. What data do you have? Or are you trying to use meta-data from the literature?
Hi ok no problem.

At the moment I have no data as this will be empirical research.

So my research is on looking for fractures on MRI radiology images.

I have test A which is a type of MRI scan that looks at the bones and Test B which is another type of MRI scan looking at fluid i.e blood from a fracture. These are performed together and interpreted by a clinician to decide if there is a fracture or not.

What I plan to do is give the clinician just the images from test A and get them so decide if there is a fracture or not then give them images from test B and decided if there'd a fracture or not and then give them both A + B test images together to decide if there's a fracture or not. The scans will be for a number of patients and all randomised so they wont know which scan goes with which patient to avoid bias etc.


Less is more. Stay pure. Stay poor.
How will you determine the pool of patient scans to use? In particular, will you search for patients with a particular ICD-10 code? And if so, who will be your control patients, since they have to have the scans too?

I am guessing you plan to use retrospective data and not wait for hundreds of people walk through the door. For these retrospective cases and controls, everyone had both MRI scans? Also, how were the fractures originally/definitely diagnosed. Using those scans already. If so, your final conclusions would be, of patients already diagnosed with Test A and/or Test B, this is how the approaches function when rereviewed again. So if the scans suck individually by themselves or tandem the first time, they will suck again in this study. Meaning you don't know, what you don't know. I am not saying this is a bad study, but will have limitations.

So you get all of the images and a pool of radiologists, randomize the images and potentially have multiple radiologist diagnose them with overlap in images being seen by multiple radiologist, in case some aren't as good or are using different criteria. You should keep all of the images and if possible run them through a machine learning algorithm (convolutional neural network), probably won't have enough, since in the future that is what is going to be reviewing these images anyway.

Yeah, there are test to compare the Sen, SPEC, PPV, NPV - since in theory all you are doing is comparing percentages. However, your gold standard is likely the same thing you are testing it against. Your issue will be in regards to false negative, but perhaps you say those are likely trivial fractures that are clinically relevant and will self-resolve, but you have to acknowledge you Specificity and NPV estimate may be off. Your Sen and PPV can also be off if you are using the test as the gold standard, if there is an artefact in the image.
Hi firstly wow thank you seem to be quite knowledgeable in this area.

So I’m specifically looking at patient post relocation of the knee looking for associated fractures. So I’m doing a retrospective study, so my pool of patients will be all those with a confirmed dislocation on initial x-ray that have gone on to have imaging of the knee post relocation. They won’t be any control patients as such.

Yes every patient in my pool had test A and B. The fractures were definitely/originally diagnosed using test A+B as it has such a high sensitivity and specificity there isn’t really a better way of testing for it other than autopsy. Yes you hit the nail on the head my study will have limitation because as you say Test A and B combined doesn’t have 100% sens and spec and therefore test A or B will just be a percentage of this.

Yes I’m using 10% of the images to be assigned to every radiologist so I can assess inter rater reliability. We call that Computer aided diagnosis and it will be the future but not available just yet.

So yes I have to assume that I have 0 false negatives and 0 False positives for A+B, however like to you said it would be reasonable to say the missed fractures on the A+ B would be very minimal and likely insignificant. If I had a Gold standard that was accessible that would be a lot easier but I only have access to the retrospective images, unfortunately patients tend to resist autopsy while still living.

How would I statistically compare A or B vs A+B combined in terms of sensitivity and specificity? I can work out the TP,TN,FP,FN in a 2x2 table for A or B compared to A+B and then work out the confidence intervals. The issue comes when say A+B spots two fractures and say Test A only spots 1 fracture but misses the other. Is that a true positive and a false negative at the same time and therefore is it reasonable to call it a false negative overall. There doesn’t seem to be much literature guidance on doing this.

The images will be reviewed to ensure they are optimal with no movement artefact.