I'm trying to compare the performances of 5 different softwares acting as diagnostic tools. all they do is asking the patient questions for symptoms.

I have given each software 20 vignettes in which everyone has 20-40 symptoms the softwares should discover by asking questions.

The number of symptoms the software discovered divided by the total number of symptoms the patient had in the vignette I would call sensitivity. when averaging these proportions for each software on all vignettes I get the "mean sensitivity" or the sensitivity of my diagnostic tool.

The numbers of symptoms the software discovered divided by the total number of questions it asked I call "specificity". again I get 20 proportions for each software for the number of vignettes which I can average for each software.

now I want to compare the performances of the diagnostic tools - how would you do that?

====

I thought one way to do so is to use chi square in which I can show a significant difference between the sensitivities of the different tools or the specificities. what post hoc test would you use? (lets say to show if 1 diagnostic tool is preforming better - the ratio is bigger than the others)

Any other ideas how to compare the groups? How to graphically emphasize the difference in performances?

Thank you. David