Don't simplify. It just makes everything blurred and vague. What is your research goal? You want to assess the rate of those who appeared in the follow-up and compare them with the number of the first group? This makes
only one percentage (patients attending the followup per cent of all the patients)
not two (and no comparisons can be done). So do you have a couple of diseases among patients and you want to see if the patients with disease A are returning more? Please explain completely. Also please explain the other study factors like the sample sizes etc., so your question can be meaningful to a future reader too. Thanks.
1) In the "Ideal" case, the two populations are separate. What is the best test for this? Chi-square?
Yes chi-square would be good. But how in the "ideal" situation, these two populations are separate? Quite the opposite, it seems that in the ideal case, all the patients attending a follow-up must be who have already attended the first session, no? How can these two populations be separate, and how is that ideal?
2) In the "Real" case, some of the patients in the 2011 data are also in the 2012 data set. Which test is most appropriate in this case?
Do you know which patients were exactly the same? If yes, separate them and use a McNemar for the "same" patients and a chi-square for the others. Otherwise, if you don't know which patients were the same and which different, you would have a murky sample and I don't think either of these tests would be correct in this case. However, if you did not have another choice, go with chi-square (although its assumptions are violated and it is incorrect, but it is all you have [and I have seen similar cases getting published so you might have chance]).
Also check out trend analysis.
-------------------
jdub said:
Sure, allow me to explain. (First time poster here, so now I know I shouldn't simplify!) I am comparing the rate of "linkage to treatment" pre- vs post- intervention. (Linkage to treatment = patients who receive a positive test result who then later show up for treatment). The intervention in this case is a type of remindering / alert / messaging system. I have three different populations because I am studying its use in three different cases / diseases / clinic types. For disease A, the two populations (pre & post) are around n = 40 each. For disease B, it's around n = 80 or so. For disease C, it's around 2,000 each.
I'm not sure yet if I will be able to track patients across pre- and post-intervention phases yet. Your suggestion about McNemar's test is well-taken. If in fact I cannot make this determination, do you have a recommendation of a test I should use? Presumably a non-parametric test?
I have not used trend analysis before but I will check it out. Thank you!
Thanks for explaining. Some extra explanations are still neded though. For example in the first population (A), are all the subjects suffering from the disease A? Or a fraction of that A population is diseased? Or do you want to compare disease A with B with C? or you just care about the rates of "linkage to treatment" before and after reminding them of the sessions in each group? Since you cannot track your patients, a McNemar cannot be used. A chi-square cannot be used either (assumption violation). However, sill chi-square is used in such situations (when there is no other hope, as I said above). [I have seen such studies]... Besides if you want to compare A with B with C, McNemar is not usable any more... So chi-square might be a test of choice, although still not completely reliable. More sophisticated tests can be used in more complicated designs (depending on what exactly your design is) but they would be affected by the same problem of murky sample... Also I think more sophisticate tests are just confusing at this stage, so I suggest sticking with chi-square for now.