Statistically significant disagreement between beekeepers

#1
Hi, I posted recently on here for help with a small hobbyists beekeeping study and was met with great help from user Karabiner.

Because of this, I'm hoping that the community can offer some extra statistical advice that will also settle a disagreement between myself and my colleague.

In essence, we have a database going back seven years (2013 - 2019) consisting of records we've kept. In these records, we have the levels of parasites found (Varroa destructor) on the bees. This parasite is pretty common and is typically recorded as either low, medium or high (To use it in SPSS I converted it to 1=low, 2=medium, 3=high).

we hope to see if there were any "bumper" years of varroa might presence in our hives. My colleague thinks we should do a Pearsons correlation, but I was under the impression that Spearman's ranks was what you needed for ordinal data like low, medium and high?

Are either (or neither) of us correct?

Thank for any help!
 

Miner

TS Contributor
#2
Regarding correlation of ordinal data, you are correct. Pearson's should not be used for ordinal data, but Spearman's may be used. The real question is whether correlation is the correct analysis. Can you provide more information on your hypothesis? What is the other variable with which you are trying to correlate?
 
#3
#4
But these numbers 1, 2, 3 corresponds to actual counts of varroa, don't they? If you have the actual numbers, like 253 counted, that would be better. Then you could use that number, or the log or the square root of that number to compare years.

If not, I would think of the 1,2,3 numbers as scaled values of the original counts, thus that the number are on a ratio scale (very rounded numbers).

I would prefer to compare the average number of mites per bee yard and compare that over the years. (And to compute the average based on one value per bee hive.) Is one year statistically significantly higher than other years? You can do a t-test.

(Correlation coefficients are about how two data set are related. I cant see how that would be used here. )
 
#5
We don't have the actual numbers, unfortunately, we're trained to make guesses based on observations. so low, for example, could be something like between 1-3 mites for every approximately 30 bees. and they go in tiers like that: (1-3) low, (3-6) medium-ish, (6+) high, etc.

Does this help?
 

Miner

TS Contributor
#6
You could go with an ordinal logistic regression. Temperature as a continuous predictor, pesticide ban as a categorical predictor and mite concentration as the ordinal response.
 

noetsi

Fortran must die
#7
Be careful if not used to ordinal regression because it has issues that are not in linear regression. Be careful to understand what odds ratios mean, don't look for an R square and understand that the assumptions you test for vary significantly between ordinal and linear regression. For instance, over dispersion is a major issue with logistic regression but is largely ignored in linear regression. Don't try to interpret the raw slopes - they are incomprehensible to carbon life forms. :p Use odds ratios.
 

Karabiner

TS Contributor
#8
In essence, we have a database going back seven years (2013 - 2019) consisting of records we've kept. In these records, we have the levels of parasites found (Varroa destructor) on the bees. This parasite is pretty common and is typically recorded as either low, medium or high (To use it in SPSS I converted it to 1=low, 2=medium, 3=high).
So you have complete data for n (how many?) beehives across 7 years (2013-19),
and you can always tell which 7 measurements belong to the same beehive?
we hope to see if there were any "bumper" years of varroa might presence in our hives.
You could just display this graphically ("which percentage of the hives showed
level 1, 2 or 3, respectively, in each year"). And there are statistical tests for
a) the global hypothesis that the levels are distributed identically across the
years (in the population from which your data were sampled), and b) for pairwise
comparisons between years.
so essentially the Hypothesis is that one of these environmental factors has had an
effect on the levels of Varroa mites present on one year or another.
Are such ideas actual hypotheses to be tested, or are they just some of possible
post-hoc explanations for statistically significant "bumpers" (if there are any)?

With kind regards

Karabiner