# Very stuck with simple problem.

#### Stuck

##### New Member
Hi everyone, :wave:

I am at the end of my tether with trying to sort out the statistics for my report. It's simple stuff but I am hopeless with statistics.

I've struggled for weeks trying to even work out what data to use and just when I thought I was getting somewhere it's gone a bit wrong. I think.

Basically I have 2 groups of cows:
A - those in their 1st or 2nd lactation n=24
B - those in their 3rd + lactation n=16
And for each group I have somatic cell counts. I wanted to see if there is a significant difference between the cell counts in young (group A) and old (group B) cows.
I did a t-test and it came up that apparantly these two sets of data ARE significantly different, which is great.

BUT! I don't know if my results are worth anything. I have now read somewhere that you can only use a t-test if the data is normally distributed, which I don't know if it is. Plus I am not sure if my sample sizes are big enough but I can't get any more cows.

I'm not a maths student and it's never been my strong point, hence I am really struggling with these statistics. I wanted to write a good report but it's overdue anyway and I am close to just giving up and submitting any old rubbish. :shakehead

(very) Stuck

#### JohnM

##### TS Contributor
To check to see if your data is normally distributed, you can do an Anderson-Darling test, and also do a normal probability plot.

If both of these above suggest that a lack of normality exists, then you can use the Mann-Whitney U-test, which is the nonparametric analogue to the t-test.

In general though, the lack of normality has to be pretty severe before you need to worry about it.

#### Stuck

##### New Member
JohnM said:
To check to see if your data is normally distributed, you can do an Anderson-Darling test, and also do a normal probability plot.

If both of these above suggest that a lack of normality exists, then you can use the Mann-Whitney U-test, which is the nonparametric analogue to the t-test.

In general though, the lack of normality has to be pretty severe before you need to worry about it.

I checked for normality visually with a graph and it looks more like negative correlation.

Is the Mann-Whitney U-test easy? I have never heard of it, please forgive my ignorance. *blush*

#### JohnM

##### TS Contributor
Can you attach your data set? Let me take a look to see if it justifies using a nonparametric test. One thing going against using nonparametric are the unequal sample sizes.

#### Stuck

##### New Member
Er... hope this attachment works!

I looked in my notes and read that the Mann-Whitney U-test is 'especially useful if sample sizes differ', so I'm guessing that's the way to go?

I have stuck the data into one of those websites which calculate it for you but am having trouble interpreting the results.

#### JohnM

##### TS Contributor
I think what you can say here is that the variances are different - and that's what appears to cause the difference in averages. In other words, the B group has a few high data points (in the 400's), but many of them are in the same general range as the A group.

So, I think you're limited in saying that the B group has a higher variability than A.

Is there anything you can think of that makes the high data point cows in B different from the other data points in B?

#### Stuck

##### New Member
JohnM said:
I think what you can say here is that the variances are different - and that's what appears to cause the difference in averages. In other words, the B group has a few high data points (in the 400's), but many of them are in the same general range as the A group.

So, I think you're limited in saying that the B group has a higher variability than A.
Does that mean I can't compare the two sets?

#### JohnM

##### TS Contributor
You can certainly compare them, but I don't think you can conclude, with conviction, that the average cell counts are different.

What I see is that there's more variation in B's cell counts (which causes the difference in averages) - in other words, some cows in B have higher cell counts, but not all of them do....

#### Stuck

##### New Member
JohnM said:
You can certainly compare them, but I don't think you can conclude, with conviction, that the average cell counts are different.

What I see is that there's more variation in B's cell counts (which causes the difference in averages) - in other words, some cows in B have higher cell counts, but not all of them do....
Oks, I understand what you mean.

Hope you don't mind me asking just one more little question... if I do a test such as the Mann-Whitney U-test and the result comes back as 'significantly different', why is that result not valid? Is it simply the fact that the sample sizes are unequal sizes?

#### JohnM

##### TS Contributor
Mann-Whitney initially ranks the cows, regardless of their group membership, then compares the rank sums of the two groups. If one group has a much higher sample size, then it can get a higher rank sum just because it has more items.....so in other words, it has an "unfair" advantage.

#### Stuck

##### New Member
JohnM said:
Mann-Whitney initially ranks the cows, regardless of their group membership, then compares the rank sums of the two groups. If one group has a much higher sample size, then it can get a higher rank sum just because it has more items.....so in other words, it has an "unfair" advantage.
Right, that makes sense.

I think I'm going to have to re-write the whole thing at this rate!

Thanks for your help, you've been great.