I am working on a non-linear statistical mixed effects model using package nlme in the R software.. The response variable is a cumulative flux of carbon dioxide, representing soil carbon mineralisation, during soils incubation at controlled temperature and humidity.

The soils come from 9 different plots grouped in 3 blocks (it’s a split-plot design). Therefore, there are 3 plots by block. Each plot of a block is associated with a treatment, either FB, FP or TEM corresponding to different soil fertilisation. All treatments are tested in each block.

Soils are sampled from each plot and then incubated. At 10 dates, the carbon dioxide produced since the last measure is measured.

At last, the dataset contains 3 block (each with 3 plots) x 3 treatments (one for each plot in each block) x 10 measure dates = 90 data

In my domain, the cumulative produced carbon dioxide can be modelled by the equation below:

co2(t) = C0*(cl*(1-exp(-kl*t)+(1-cl)*(1-exp(-kr*t)))

where co2(t) is the cumulative produces carbon dioxide at time t, C0 is the soil carbon content at t=0 (beginning of the incubation), cl is the proportion of carbon that correspond to an easily mineralised pool of carbon and kl and kr are respectively the constant of mineralisation (determining the speed of mineralisation) of the easily and hardly mineralised pools of soil carbon.

The aim of the study is to assess if the covariate ‘treatment’ can explained (at least in part) the different values of the model parameters (cl, kl and kr) from the different plots. To do this, I fit the non linear model with multi-level random effects (plot nested in block) on the parameter and investigate the relevance to take into account the treatment covariate in the fixed part of the model parameters. This procedure is well explained in Bates and Pinhero book: Mixed-Effects models in S and S-plus.

I managed to adjust a model that fit the data well enough with no significant departure from the assumptions (residuals normality, homoscedasticity,...etc). In R, the model look like this :

Code:

```
mixedmodel <- nlme(model = cumulcmin~c0*(cl*(1-exp(-kl*time))+(1-cl)*(1-exp(-kr*time))),
data = data,
fixed=c(cl~1,kl~1,kr~treatment),
random=list(block=cl~1,plots=pdDiag(cl+kr~1)),
start = c(fixcoef[1],fixcoef[2],fixcoef[3],0,0))
```

Now, I am wondering something about these assumptions and how the response variable is built. Indeed, at each measure date, we measure the carbon dioxide produced since the last measure date. However, we work on the cumulative produced carbon dioxide, so the response variable at a specific measure date is actually the sum of measured produced carbon dioxyde at each previous measure date :

co2(date 1)=co2(measured at date 1)

co2(date 2)=co2(measured at date 1)+co2(measured at date 2)

co2(date 3)=co2(measured at date 1)+co2(measured at date 2)+co2(measured at date 3)

…

So I am guessing that by construction, the assumption of independence between residuals of the model is not respected, which is kind of confirmed by the attached auto-correlation plot.

However, this auto-correlation plot appears not typical of auto-correlation plot encountered in time series analysis. I tried to add some correlation structure (AR1, ARMA, ..) in the errors model but without improvement. I am not an expert in correlation structure so maybe there is something that I don’t see.

Any suggestion about this?

Thanks a lot and sorry for the long post!