# Two way repeated measures with zeros, non integer values, and non-normal distribution

#### choppedpete

##### New Member
I would really appreciate some help with a few models I am trying to run. Essentially my data looks at how often a subject was visited depending on treatment and subject type across two years. The data looks like this:

Year | Subject | SubjectType | Treatment | BlockNum | NumberOfVisits | DurationOfVisits
--------+---------+-------------+-----------+----------+----------------+---------------+
1 | 1 | Type1 | Treatment1| 1 | 14 | 15.6
2 | 1 | Type1 | Treatment1| 1 | 0 | 0
1 | 2 | Type2 | Treatment2| 2 | 3 | 4.3
2 | 2 | Type2 | Treatment2| 2 | 0 | 0

and so on for 200 subjects with a measurement for each year.

Essentially I want to create a model that tests if the number of visits / duration of visits are different between treatment and subject across both years, and if there are any interactions. BlockNum refers to the experimental design being split into three randomised blocks (three blocks of plants growing in a greenhouse). I have tried a bunch of different models and cant seem to get a good resolution:

Repeated Measures ANOVA:

Code:
model1 <- aov(NumberOfVisits ~ SubjectType*Treatment*Year*BlockNum + Error(Subject/Year), data=dframe1)
However, the issue with this is that the data is left skewed with zeros in it (so log wont work), and I cannot successfully transform with any of the following:

Code:
trans_Y <- (dframe1$NumberOfVisits)^3 trans_Y <- (dframe1$NumberOfVisits)^(1/9)
trans_Y <- log(dframe1$NumberOfVisits) trans_Y <- log(dframe1$NumberOfVisits+0.1)
trans_Y <- log(dframe1$NumberOfVisits+0.000001) trans_Y <- log10(dframe1$NumberOfVisits)
trans_Y <- exp(dframe1$NumberOfVisits) trans_Y <- abs(dframe1$NumberOfVisits)
trans_Y <- sin(dframe1$NumberOfVisits) trans_Y <- asin(dframe1$NumberOfVisits)
As such I then tried a Generalised GLMM:

Code:
library(lme4)

model1 <- glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = inverse), data = dframe1)
However this returns:

Code:
    Warning message:
In glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1 | Year),  :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
And so trying lmer:

model1 <- lmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = identity), data = dframe1)
Warning in lme4::lmer(formula = NumberOfVisits ~ SubjectType * Treatment * BlockNum +  :
Error in (function (optimizer = "bobyqa", restart_edge = TRUE, boundary.tol = 1e-05,  :
unused arguments (tolPwrss = 1e-07, compDev = TRUE, nAGQ0initStep = TRUE, checkControl = list(check.nobs.vs.rankZ = "ignore", check.nobs.vs.nlev = "stop", check.nlev.gtreq.5 = "ignore", check.nlev.gtr.1 = "stop", check.nobs.vs.nRE = "stop", check.rankX = "message+drop.cols", check.scaleX = "warning", check.formula.LHS = "stop", check.response.not.const = "stop"), checkConv = list(check.conv.grad = list(action = "warning", tol = 0.001, relTol = NULL), check.conv.singular = list(action = "ignore", tol = 1e-04),
check.conv.hess = list(action = "warning", tol = 1e-06)))
1: In lmer(NumberOfVisits ~ SubjectType * Treatment * BlockNum + (1 | Year),  :
2: In lme4::glmer(formula = NumberOfVisits ~ SubjectType * Treatment * BlockNum +  :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
And when I a log/inverse link function (I presume this wont work because of the zeros?):

Code:
model1 <- glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = log), data = dframe1)    # random intercept
Error in eval(expr, envir, enclos) :
cannot find valid starting values: please specify some
The following will work for 'Number of visits', but not 'Duration of visits' (as it is non-integer values)

Code:
model1 <- glmer(DurationOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = poisson, data = dframe1)
However this returns the following, which doesn't tell me the significance of 'Treatment' itself, but rather the significance of each subset within treatment:

Code:
summary(model1):

Call:
lm(formula = DurationOfVisits ~ SubjectType*Treatment*BlockNum, data = dframe1)

Residuals:
Min      1Q  Median      3Q     Max
-8.3251 -3.9093 -0.5325  2.1748 18.9394

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                                 3.9967     1.6053   2.490   0.0138 *
Treatment2                                  3.1750     2.1018   1.511   0.1328
Treatment3                                  0.1306     2.1018   0.062   0.9505
Treatment4                                 -0.7279     2.1018  -0.346   0.7295
...
I would really appreciate some help with this. I feel like I'm missing something obvious here, it has been quite a number of long days deep in R and my brain is a bit frazzled.

I'm relatively new to R, so explanations in relatively simple terms would be appreciated!

Thanks a lot ahead of time!

#### choppedpete

##### New Member
Re: Two way repeated measures with zeros, non integer values, and non-normal distribu

Just to note (I think) I have made progress today, have altered my models, and after a while I have arrived at the following:

Code:
model2 <- glmer(NumberOfVisits ~ SubjectType*Treatment*Year + (1|BlockNum)+ (1|Subject), family = poisson (link=sqrt), data = dframe1)
and as this wont work for non-integer values, I am using the following for 'duration':

Code:
model1 <- lmer(DurationOfVisits ~ SubjectType*Treatment*Year + (1|BlockNum)+ (1|Subject), data = dframe1)
I can't get any other families or link functions to work in either of them for some reason.

In addition, I have figured out that I can use
Code:
Anova(model1, Type="III")
to generate test statistics for treatment/subject type/year.

Am I along the right lines? Essentially I am trying to test if the dependent variable is significantly different between subject type / treatment / year with any interactions between these, however I generally have left skewed non-normal distributions, some variables are non-integer values, and it involves repeated measures, so it is a little more complicated than I am used to!

Thanks again, and have a nice evening!