# Which test should I use?

#### Vthompson

##### New Member
I wish to compare test scores from 3 different learning environments (online, f2f and blended) to see which setting preformed better on the common exam the took. The sample size and distribution is: online n=18 (normal), f2f n=27 (left skewed) and blended n=43 (normal). Which statistical test should I use? Thanks in advance for your help.

#### obh

##### Member
Hi,

I assume there are different people in each learning environment? say, independent groups? and continuous test score?
I assume the left skew is not almost symmetric?

If so, you may compare the normal groups using t test, and between the normal and the non-normal using Mann Whitney U test.
since you have 3 combinations you should choose a smaller significant level (α).

if you only want to know if all environments are equal (score) you may use the Kruskal–Wallis test (or one-way ANOVA if the skew data is almost symmetric)

#### Vthompson

##### New Member
Yes, all three environments were independent groups and the data contained scores like 0 to 100. Using the the Shapiro-Wilk’s test (p > .05)), the z value computed at -1.98 which is not between + - 1.96 so it proved to be left skewed. The visual box plot also looked left skewed compared to the other settings which had some skewness but was still approx. normal.

I want to determine which environment (web, blended, f2f) performed better than the other. All of them were taught in different settings but took the same exam f2f. Being that I want to compare all three environments, should I just go ahead and do the Mann Whitney U test? Thanks for your help.

#### obh

##### Member
Generally, the fact that the data failed the Shapiro Wilk test doesn't say it is skewed. A symmetric distribution may not distribute normally and fail the test ...
Even if the sample data is skewed, it doesn't say the data is skewed, for this, you need to test the data for skewness.

For very big sample the Shapiro-Wilk test may fail normality even for a quite normal data (because it will find a minor change from the normal which will be significant because of the big sample.

Can you please show the histogram (or better paste the data) of the skewed group?

The following online will check for Shapiro Wilk test, but will also check the skewness of the sample and test the significance of the skewness.
http://www.statskingdom.com/320ShapiroWilk.html

But if you just create a histogram you can check the chart, and if it is reasonably symmetric you may use the t-test.

Please notice, unlike the t-test that compares the groups' averages, the rank test compares the entire distributions. (and this is not a problem)
When the two groups' distributions have a similar shape, the test will also compare the median of each group.
For a symmetric distribution, the median is the average

Last edited:

#### Vthompson

##### New Member
Generally, the fact that the data failed the Shapiro Wilk test doesn't say it is skewed. A symmetric distribution may not distribute normally and fail the test ...
Even if the sample data is skewed, it doesn't say the data is skewed, for this, you need to test the data for skewness.

For very big sample the Shapiro-Wilk test may fail normality even for a quite normal data (because it will find a minor change from the normal which will be significant because of the big sample.

Can you please show the histogram (or better paste the data) of the skewed group?

The following online will check for Shapiro Wilk test, but will also check the skewness of the sample and test the significance of the skewness.
http://www.statskingdom.com/320ShapiroWilk.html

But if you just create a histogram you can check the chart, and if it is reasonably symmetric you may use the t-test.

Please notice, unlike the t-test that compares the groups' averages, the rank test compares the entire distributions. (and this is not a problem)
When the two groups' distributions have a similar shape, the test will also compare the median of each group.
For a symmetric distribution, the median is the average

#### Vthompson

##### New Member
Thanks for your reply. That website is so helpful. It did not work using internet explorer, but worked with Mozilla Firefox.
After entering that data, it did test not normal.
I also checked the data for my other 2 settings and they tested normal.
So, just to make sure I understood your first reply,...you said "and between the normal and the non-normal using Mann Whitney U test." Are you saying I can use this test to reveal which setting performed better than the other?

Last edited:

#### obh

##### Member
Happy to help

The group may be not normal, but still okay for the t-test, if it is reasonably symmetric. (you didn't send the histogram)

Correct, the Whitney U test may say which test perform better, but it will compare the entire population
Which may be better than only one measurement - the average.
For example, the average of online may be higher than the average of f2f but still, most of the people in f2f will get a better score than online. In this case, the Mann Whitney U test will give a better score to the f2f.

#### Vthompson

##### New Member
See the attached. I sent the box plot too.

#### Attachments

• 193 KB Views: 3

#### Vthompson

##### New Member
Thanks so much!! That clear it up. I can proceed
Have a good night

#### obh

##### Member
Great, good night, good afternoon in Melbourne

#### Vthompson

##### New Member
Greetings again. I hope you are able to help me again. I would really appreciate it.

Just to refresh your memory, I have a blended, f2f and web course that all learned from diff. settings but took a f2f final exam. I wish to find out which setting performed better on the final.

The blended and web settings are normal and passed the homogeneity test. I ran a T-Test and it found that there is a difference in settings (P = .23, two tailed) so I declared the web setting performed better than the blended on the final b/c their mean and median scores were above the blended setting results.

The f2f setting, as you saw from our correspondence this week, it was not normal so I performed a Mann Whitney U test on the web/f2f (P=.508, two tailed) and blended/f2f (P=.282, two tailed). Both tested no difference in any of the settings. However, the results are contradicting themselves.

If the blended and f2F did the same and the web and f2f, did the same too,...how did the Web do better than Blended?
I included the data in excel format. I also have the SPSS file if you need it and output.

#### Attachments

• 185 KB Views: 12

#### obh

##### Member
There is no need to remind
I can't watch your data currently, on the train.. , I will check later.

I will give a general example, I hope it will fit.

General in statistics you don't say that both groups are equal, you only say that the difference between the groups is not big enough to be counted as significant. A more powerful test may prove otherwise (for example bigger sample data)

Example : groups score : a:3 b:5 c:7
Let's assume the a significant difference is 3.
So "a=b", "b=c" but "a!=c"

In your case you also need to consider that for the same data t test and mann u won't give exactly the same answer

Last edited:

#### Vthompson

##### New Member
LOL! I just had to do a recap But glad you remembered my story.

I liked and agreed with what you said "General in statistics you don't say that both groups are equal, you only say that the difference between the groups is not big enough to be counted as significant." That makes sense to me.

However, I am not sure what to do next. I think I can believe my T test b/c the data is normal and that's a better situation. I am thinking, yes I can conclude the Web settings performed better than the Blended from that T test. However, ..I am not sure how the F2f ranked.

I will wait to hear from you again. Thanks always

#### obh

##### Member
Hi Vthompson,

Generally, when taking several tests, you may need to take a smaller significant level.
for example, if running 3 test and each significant level is 0.05, each test will be correct in 0.95. but the probability that the 3 test will be correct is 0.95^3. which leave you with a much bigger effective significant level (0.14 if the tests are independent, in your case it will be smaller but bigger than 0.05)
But in your case, the significant level is low (0.009 for unequal variances) and still leave you with a good effective significant level (0.03)

I see that the example fits your case, the biggest difference is between "Web" and "Blend, in the averages (but not in the medians)
Anyway if I understand your results correctly, it is possible to get such results. (I checked your t-test and it is okay )

Just to get a feeling, you can see the standard deviations are bigger than the differences between the means
How did you choose your sample size? (a larger sample size support a better power for a test)
If for example, you run a test with the power to identify am an effect of 10 points, you most likely will not identify an effect of 8 or 4 points

Last edited:

#### Vthompson

##### New Member
My sample was not picked, I had to get students from my courses to volunteer to participate. I was happy to have all of them participate but some were under 18 years old at the time and by law couldn't. Also, some student dropped out of the course and did not take the final exam.

I think I get what you are saying, I am just unsure what to do next. I believe the results of the T Test too. If the blended and f2f had no significant different, should I just conclude the Web course performed better on the exam?

#### Vthompson

##### New Member
I am just getting back to my CPU. I just realized I did not do the Mann Whitney test for Blended and Web. I ran one of the pairs twice by mistake. See the results from the Blended/Web comparison on pg 3 in the attachment. The results are the same as the output from the T test. I am happy about that.

I am out the door almost and will look at this again when I get home.

#### Attachments

• 187.8 KB Views: 4
Last edited:

#### obh

##### Member
You should use the t-test for Blended and Web as both distributed normally.
But you shouldn't be surprised to get similar results, as the Mann-Whitney U test has 95% efficiency in comparison to a Two-sample T-test and if the t-test is appropriate to use of course you can use the Mann-Whitney U test (but you shouldn't as it is less powerful a bit)

Regards to the conclusion, you can show the results. You can conclude that only one connection is significant.
You can also estimate the power of the Mann Whitney test based on a t-test, so you could understand the limitations of your tests