How to measure coherence of responses on surveys

Sorry all, I've tried Google, wikipedia, searching here. I suspect I don't have the right buzzword.

Here's the study: I am implementing an educational methodology in how to evaluate journal articles in Journal Club. There will be a 6 month pre-intervention survey period, followed by a 6 month post intervention period. The survey is very simple and consists of 2 likert scales measuring how strong or flawed the methodology of the paper was, and how applicable to clinical practice the conclusions are.

Now of course, there is no "gold standard" to judge how good an article is. So instead, I will measure response cohesion by month. My hypothesis is that in the loosey-goosey free-for-all that is journal club where I now teach, the respondents will have very high variance, but after I implement my methodology, it will tighten up, because they are focusing on the same aspects of the paper.

This is what I have so far: Since every month several articles are reviewed, I have to standardize the data. So for each question, I will subtract the mean of all the responses to that question for that article. Then for the whole month, I'll take the standard deviation of mean subtracted scores. Good so far. What I expect to see is a plateau of monthly SDs for the first six months, with a decline over the next six months (the null hypothesis being that the plateau continues).

Now here's the rub: How do I demonstrate that the decline is not due to random chance? Since it's linear data, can I apply the standard error about *the standard deviation* to form a confidence interval?

I get the very definite feeling that I am re-inventing a square wheel, but I don't know the terminology to search for. I'm posting this in Psychology even though that's not my field because it seems to me Psychologists would have rendered this problem tractable by now.

You are right in that this road has been trodden before. What you want to look for is the literature on testing variances. There are about 4 common tests from which you might choose. One of them is the Levene test, which tends to make a person say "Oh, shucks!" because they have seen it so often in a different context--that of a diagnostic test performed before doing a T-test or anova. I've been in that situation too.
I'm afraid I can't get involved at the level you're asking. If you have access to a stat package such as SPSS your work will be much easier. Doing it on a spreadsheet, you'll need to carefully read up on exactly how the Levene test is computed. Good luck with it!