advice on statistics

#1
I'm a doctoral research student in England, and have conducted both a pilot study and a main study. I've done some statistical analysis already, and have done quite a bit of writing, but since statistics are definitely NOT my strong point, I'm worried that the analysis I've done so far is either way too simplistic and unsophisticated for a dissertation, or just plain wrong. I'm looking for advice about what I've done so far, and recommendations for more things I could do, or corrections. Here are the details - I apologize in advance if it's all a bit much:

My study compares the effectiveness of three different teaching methodologies, and includes a control group. I have data for a pilot study and a main study. I’m using SPSS 12.0.1.

Research questions:
1. what effect, if any, does training have on reading comprehension, context ability and vocabulary size?
2. does method of training (method A, method B, method C) have any effect on reading comprehension, context ability and vocabulary size?


Pilot study – 17 participants in all, divided into 4 groups with 5, 3, 3 and 6 members.

· Pre-tests: Reading comprehension (RC), context ability (CA), several vocabulary size measures (V), and language aptitude (LA). A language proficiency (LP) score was also obtained.

· Pre-tests followed by training.

· Training followed by immediate post-tests: reading comprehension and context ability.

· Delayed post-tests: RC, CA, V.

Statistics for pilot study –

· Test for normality of distribution of RC and CA tests for the group as a whole: Kolmogorov-Smirnov tests are non-significant (.200), but Q-Q plots show dots not on the line (around, but not on).

· Test for equivalence of groups (to see if groups were equivalent before training):
Because the groups were so small, I used Kruskall-Wallis tests on the means for RC, CA, LP and V. I also combined the groups in various ways, which in some cases meant I was comparing the means for two groups, in which case I used Mann-Whitney U tests to compare the means. When I was using Kruskall-Wallis tests and a significant or near-significant difference was seen, I then compared the groups in pairs, using the Mann-Whitney U test, to see where the significant difference was. (is this legal? I have this feeling that I am increasing my chances of some kind of error by doing repeated comparisons . . . .)

· Comparison of groups’ performance:
I first compared the mean gain scores for each group, using Kruskal-Wallis tests. Again, if I saw a significant difference or near-significant difference, I then used Mann-Whitney U tests on pairs of means, to see where the significant difference was. (same concern as above - is this legal?)

I also compared, within each group, pre-test and post-test scores, to see which groups had a significant improvement from pre-test to post-test. For this comparison, I used the “two-related samples test” under non-parametric tests. (I’m not sure if this makes sense to do . . . )




Main study: 44 participants in all, divided into 4 groups, with 14, 11, 10 and 9 members.

(Pre-test, training, immediate post-test and delayed post-test the same as in pilot study.)
· Pre-tests: Reading comprehension (RC), context ability (CA), several vocabulary size measures (V), and language aptitude (LA). A language proficiency (LP) score was also obtained.

· Pre-tests followed by training.

· Training followed by immediate post-tests: reading comprehension and context ability.

· Delayed post-tests: RC, CA, V.

Statistics for main study –

· Test for normality of distribution of RC and CA tests for the group as a whole: Kolmogorov-Smirnov tests are non-significant (.200), but Q-Q plots show dots not on the line (around, but not on).

· Test for equivalence of groups (to see if groups were equivalent before training):
Because the groups were larger than in the pilot study, and because of apparently normal distribution of data (for RC and CA pre-test, at least), I decided it was ok to use parametric tests. I conducted either one-way ANOVA or (in the case of comparing two groups) independent T-tests on the means for LP, LA, RC and CA. (The big question is, should I have used parametric or non-parametric techniques for any part of the main study?)

· Comparison of groups’ performance on RC, CA and V:
I first compared the mean gain scores for each group, using one-way ANOVA. Because I was combining my groups in various ways, whenever I was comparing only two groups, I use independent T-tests to compare their mean gain scores.

I also compared, within each group, pre-test and post-test scores, to see which groups had a significant improvement from pre-test to post-test. For this comparison, I used the “paired-samples t-test” under “compare means”. (Again, I’m not sure if this makes sense to do . . . Also, should I have use parametric, or non-parametric? )

· Looking for effects of other variables:

Language proficiency: I wanted to know whether there was any effect for LP in each group, so I compared the mean gain scores for each level of language proficiency (there were three levels in this study – beginning, intermediate or advance) within each group, for CA and RC. Due to the small size of the groups, non-parametric tests (Kruskal-Wallis or Mann-Whitney) were used: Kruskal-Wallis when a group had all three levels of language proficiency, Mann-Whitney when there were only two levels of language proficiency in a group. I also used Mann-Whitney when I saw a significant or near-significant effect in a Kruskal-Wallis, to find out where the significant difference was. (Should I be using these tests, in this way? Same concern as in the pilot study . . . )

Language aptitude – I divided the participants into three levels of language aptitude, and then did the same analysis as above, for each group, using the same techniques.

Vocabulary size – To see if pre-test V had any effect on the groups’ performances, I correlated their mean gain scores with pre-test V. For this I chose “bivariate” under “correlate” – I asked for a Pearson correlation, and since I made a prediction that the higher the V score, the better the gain score, I chose the one-tailed test of significance.


Overall, “big” questions:

· Testing for normal distribution: Do I have to subject all of my pre-test (and post-test) data to these tests? I have only considered two tests in terms of normal distribution, RC and CA. Do I also have to subject LP, LA and V to the same tests? V consists of seven separate sub-tests, so I’m not that excited about the prospect.

Also, if some of the data from some of the tests are normally distributed, but other data from other tests are not, what does that mean? For example, if the distribution of data for LA is not normal, what does that mean for the tests I do with LA – does that automatically mean I have to use non-parametric tests whenever LA is involved, even if the other data are normally distributed?

· Parametric v. non-parametric – I have basically let the size of my groups determine whether I use parametric or non-parametric techniques. For the pilot study, because the entire group is so small, and the constituent groups are even smaller, I just used non-parametric. For the main study, when working with the group as a whole, I used parametric, but when I was doing something within the groups, I switched to non-parametric because they were so small. Was this the right thing to do?

· Kruskal Wallis vs. Mann Whitney U – When I was comparing three or more groups using non-parametric techniques, I compared their means using Kruskal Wallis. If a significant difference was seen, the only way I could figure out how to see where the difference was, was to do Mann Whitney U tests on pairs – for instance if I had three groups, I’d do Mann Whitney on 1 & 2, 1 & 3, and 2 & 3. Was this the right thing to do, or am I increasing my chances of some kind of error?

Any advice much appreciated!
 

JohnM

TS Contributor
#2
Jodee,

First of all, I think you've done an excellent job - no one could ever question your "due diligence."

Just a few coments and suggestions - nothing major:

(1) Don't sweat the normality thing too much - do the testing for the sake of completeness, but don't worry too much about it - parametric tests are remarkably robust to violations of the normality assumption - one thing that many people forget is that statistical inference, in these cases, is based on sample means, which tend to have a normal distribution, no matter what the distribution of individuals follows.....also - for normal probability plots, the points don't have to fall right on the line...

(2) It's perfectly acceptable and standard practice to do an "omnibus" test like ANOVA or KW and follow up with pairwise comparisons - yes you do run the risk of increasing your Type I error rate as you increase the number of tests, but you really have no choice - just mention it in the "Limitations of This Study" section. If the theory / hypotheses that you are investigating support the notion of doing these comparisons, then you need to do the comparisons.....

(3) I would avoid switching from parametric to non-parametric tests "mid-stream" - do the entire analysis with nonparametric methods, then do the entire analysis with parametric methods, and compare your results - if they are similar, then that's good! If they're not, then you have something else interesting to write about.....

(4) Choosing to use a nonparamtric method just because you have small sample sizes is not necessarily wise - nonparametric methods do not have a lot of power when n is small. Remember, a "significant" result with many rank-based tests is based on finding an "unusual" rank permutation - if n is small, then there won't be as many possible permutations, and none of them will be "unusual enough" to be statistically significant.

(5) If you generally got the same results between the pilot study and the main study, then that's very powerful evidence - two independent studies supporting the same notion is much more powerful than one big study that has a highly significant result.

(6) Nothing is "illegal" in statistics -you can do anything you want, as long as you can justify it and understand the risks and pitfalls of what you've done.

(7) Don't worry about making your statistical methods sophisticated enough for a dissertation - it's about extending the knowledge in your field - if you can do that with a simple test, then all the better....

Best of luck,
John
 
#3
Thanks very much, John - you've been very helpful, and I feel much better now!

I may have more questions a little later, when I've had another look, but thanks for your help!