My study compares the effectiveness of three different teaching methodologies, and includes a control group. I have data for a pilot study and a main study. I’m using SPSS 12.0.1.

Research questions:

1. what effect, if any, does training have on reading comprehension, context ability and vocabulary size?

2. does method of training (method A, method B, method C) have any effect on reading comprehension, context ability and vocabulary size?

Pilot study – 17 participants in all, divided into 4 groups with 5, 3, 3 and 6 members.

· Pre-tests: Reading comprehension (RC), context ability (CA), several vocabulary size measures (V), and language aptitude (LA). A language proficiency (LP) score was also obtained.

· Pre-tests followed by training.

· Training followed by immediate post-tests: reading comprehension and context ability.

· Delayed post-tests: RC, CA, V.

Statistics for pilot study –

· Test for normality of distribution of RC and CA tests for the group as a whole: Kolmogorov-Smirnov tests are non-significant (.200), but Q-Q plots show dots not on the line (around, but not on).

· Test for equivalence of groups (to see if groups were equivalent before training):

Because the groups were so small, I used Kruskall-Wallis tests on the means for RC, CA, LP and V. I also combined the groups in various ways, which in some cases meant I was comparing the means for two groups, in which case I used Mann-Whitney U tests to compare the means. When I was using Kruskall-Wallis tests and a significant or near-significant difference was seen, I then compared the groups in pairs, using the Mann-Whitney U test, to see where the significant difference was. (is this legal? I have this feeling that I am increasing my chances of some kind of error by doing repeated comparisons . . . .)

· Comparison of groups’ performance:

I first compared the mean gain scores for each group, using Kruskal-Wallis tests. Again, if I saw a significant difference or near-significant difference, I then used Mann-Whitney U tests on pairs of means, to see where the significant difference was. (same concern as above - is this legal?)

I also compared, within each group, pre-test and post-test scores, to see which groups had a significant improvement from pre-test to post-test. For this comparison, I used the “two-related samples test” under non-parametric tests. (I’m not sure if this makes sense to do . . . )

Main study: 44 participants in all, divided into 4 groups, with 14, 11, 10 and 9 members.

(Pre-test, training, immediate post-test and delayed post-test the same as in pilot study.)

· Pre-tests: Reading comprehension (RC), context ability (CA), several vocabulary size measures (V), and language aptitude (LA). A language proficiency (LP) score was also obtained.

· Pre-tests followed by training.

· Training followed by immediate post-tests: reading comprehension and context ability.

· Delayed post-tests: RC, CA, V.

Statistics for main study –

· Test for normality of distribution of RC and CA tests for the group as a whole: Kolmogorov-Smirnov tests are non-significant (.200), but Q-Q plots show dots not on the line (around, but not on).

· Test for equivalence of groups (to see if groups were equivalent before training):

Because the groups were larger than in the pilot study, and because of apparently normal distribution of data (for RC and CA pre-test, at least), I decided it was ok to use parametric tests. I conducted either one-way ANOVA or (in the case of comparing two groups) independent T-tests on the means for LP, LA, RC and CA. (The big question is, should I have used parametric or non-parametric techniques for any part of the main study?)

· Comparison of groups’ performance on RC, CA and V:

I first compared the mean gain scores for each group, using one-way ANOVA. Because I was combining my groups in various ways, whenever I was comparing only two groups, I use independent T-tests to compare their mean gain scores.

I also compared, within each group, pre-test and post-test scores, to see which groups had a significant improvement from pre-test to post-test. For this comparison, I used the “paired-samples t-test” under “compare means”. (Again, I’m not sure if this makes sense to do . . . Also, should I have use parametric, or non-parametric? )

· Looking for effects of other variables:

Language proficiency: I wanted to know whether there was any effect for LP in each group, so I compared the mean gain scores for each level of language proficiency (there were three levels in this study – beginning, intermediate or advance) within each group, for CA and RC. Due to the small size of the groups, non-parametric tests (Kruskal-Wallis or Mann-Whitney) were used: Kruskal-Wallis when a group had all three levels of language proficiency, Mann-Whitney when there were only two levels of language proficiency in a group. I also used Mann-Whitney when I saw a significant or near-significant effect in a Kruskal-Wallis, to find out where the significant difference was. (Should I be using these tests, in this way? Same concern as in the pilot study . . . )

Language aptitude – I divided the participants into three levels of language aptitude, and then did the same analysis as above, for each group, using the same techniques.

Vocabulary size – To see if pre-test V had any effect on the groups’ performances, I correlated their mean gain scores with pre-test V. For this I chose “bivariate” under “correlate” – I asked for a Pearson correlation, and since I made a prediction that the higher the V score, the better the gain score, I chose the one-tailed test of significance.

Overall, “big” questions:

· Testing for normal distribution: Do I have to subject all of my pre-test (and post-test) data to these tests? I have only considered two tests in terms of normal distribution, RC and CA. Do I also have to subject LP, LA and V to the same tests? V consists of seven separate sub-tests, so I’m not that excited about the prospect.

Also, if some of the data from some of the tests are normally distributed, but other data from other tests are not, what does that mean? For example, if the distribution of data for LA is not normal, what does that mean for the tests I do with LA – does that automatically mean I have to use non-parametric tests whenever LA is involved, even if the other data are normally distributed?

· Parametric v. non-parametric – I have basically let the size of my groups determine whether I use parametric or non-parametric techniques. For the pilot study, because the entire group is so small, and the constituent groups are even smaller, I just used non-parametric. For the main study, when working with the group as a whole, I used parametric, but when I was doing something within the groups, I switched to non-parametric because they were so small. Was this the right thing to do?

· Kruskal Wallis vs. Mann Whitney U – When I was comparing three or more groups using non-parametric techniques, I compared their means using Kruskal Wallis. If a significant difference was seen, the only way I could figure out how to see where the difference was, was to do Mann Whitney U tests on pairs – for instance if I had three groups, I’d do Mann Whitney on 1 & 2, 1 & 3, and 2 & 3. Was this the right thing to do, or am I increasing my chances of some kind of error?

Any advice much appreciated!