Issues when including confounding variable in ANOVA

#1
Hi,

I was hoping someone could help me out with an ANOVA question. I'm currently looking to see what effect grazing has had at different years on plant diversity in a grassland vegetation community. I initially conducted a one-way ANOVA, however, it was only later on that I realised I had a potential confounding variable. The grassland community is broadly recognised as wet or dry grassland. These two communities can respond differently to disturbance (grazing). I will say that despite appropriate replication, the sample sizes per grazing year were small and unequal. Due to unequal samples sizes I used Gabriel's post-hoc test to analyse any significant pairwise differences.

Please see my initial set up below (one-way ANOVA):

Dependent variable (DV): species richness (number of species)
(Factor 1)(4 levels) Independent variable (IV): time since grazing in years (1, 2, 3, 17)

Here are the initial SPSS outputs:

SPSS_ONE_FACTOR.png

SPSS_ONE_FACTOR2.png


So, the result was significant when analysing only one factor (grazing). There was a difference in species richness when the grassland was grazed at 1 year and 17 years. I realised then that the grassland type (wet or dry) may be having an effect on species richness between the 4 grazing periods. As said earlier, the communities can respond differently (faster or slower recovery), although this is still questionable. So this presented me the challenge of how to include the grassland type as a confounding factor into the analysis.

So my new set up was this:

Dependent variable (DV): species richness (number of species)
(Factor 1)(4 levels) Independent variable (IV): time since grazing in years (2, 5, 9, 17)
(Factor 2)(2 levels) Independent variable (IV): grassland type (wet, dry)

After some research, I came to the conclusion that I could potentially use either a two-way ANOVA, a mixed model, or a repeated measures ANOVA. The last one I am not sure on because the repeated measures assumes that the samples were taken at different times. However, in my study the samples were all taken at the same time.

I decided to run with the two-way ANOVA and have the two IV's as fixed factors (I'm not even sure this is the correct test here). And this is where I got stuck. I ran into two problems. The first problem is that the two IV's are unevenly spread across the samples. So for example, there might be a sample for 17 years since grazing for dry grassland but not 17 years for wet grassland. This was a limitation of the study. This is nicely illustrated in the plot below:

SPSS_TWO_FACTOR.png

The second problem was that I was getting no statistics for the interaction analysis. I could expand the decimal points for the significant value but that was it. I have circled the problem below:


SPSS_TWO_FACTOR2.png

I would have assumed that df would be 1? I'm guessing that the absence of data here has something to do with uneven samples per IV's? In other words it couldn't estimate a mean species richness for the two IV's at the same time due to no data.

Just for arguments sake, I have included the post hoc tests for the two-way ANOVA below:

SPSS_TWO_FACTOR3.png

So, it's still significant (1 vs 17 years) albeit a little more (0.06 cf. 0.011) than it was when grassland type wasn't included.

I guess I'm wondering about a few things:

Firstly, is there another way around this? In other words is there another way to include the confounder in light of the uneven samples per IV levels?

Secondly, is the two-way ANOVA the appropriate test here? Is an interaction the same as a confounding variable? There are arguments both ways.

Lastly, Is it a problem to not have estimated marginal means for the two grassland types across the 4 levels of grazing? (see plot above).

Any other thoughts or suggestions would be greatly appreciated here.

Cheers.