I would really appreciate some help with a few models I am trying to run. Essentially my data looks at how often a subject was visited depending on treatment and subject type across two years. The data looks like this:
Year | Subject | SubjectType | Treatment | BlockNum | NumberOfVisits | DurationOfVisits
--------+---------+-------------+-----------+----------+----------------+---------------+
1 | 1 | Type1 | Treatment1| 1 | 14 | 15.6
2 | 1 | Type1 | Treatment1| 1 | 0 | 0
1 | 2 | Type2 | Treatment2| 2 | 3 | 4.3
2 | 2 | Type2 | Treatment2| 2 | 0 | 0
and so on for 200 subjects with a measurement for each year.
Essentially I want to create a model that tests if the number of visits / duration of visits are different between treatment and subject across both years, and if there are any interactions. BlockNum refers to the experimental design being split into three randomised blocks (three blocks of plants growing in a greenhouse). I have tried a bunch of different models and cant seem to get a good resolution:
Repeated Measures ANOVA:
However, the issue with this is that the data is left skewed with zeros in it (so log wont work), and I cannot successfully transform with any of the following:
As such I then tried a Generalised GLMM:
However this returns:
And when I a log/inverse link function (I presume this wont work because of the zeros?):
The following will work for 'Number of visits', but not 'Duration of visits' (as it is non-integer values)
However this returns the following, which doesn't tell me the significance of 'Treatment' itself, but rather the significance of each subset within treatment:
I would really appreciate some help with this. I feel like I'm missing something obvious here, it has been quite a number of long days deep in R and my brain is a bit frazzled.
I'm relatively new to R, so explanations in relatively simple terms would be appreciated!
Thanks a lot ahead of time!
Year | Subject | SubjectType | Treatment | BlockNum | NumberOfVisits | DurationOfVisits
--------+---------+-------------+-----------+----------+----------------+---------------+
1 | 1 | Type1 | Treatment1| 1 | 14 | 15.6
2 | 1 | Type1 | Treatment1| 1 | 0 | 0
1 | 2 | Type2 | Treatment2| 2 | 3 | 4.3
2 | 2 | Type2 | Treatment2| 2 | 0 | 0
and so on for 200 subjects with a measurement for each year.
Essentially I want to create a model that tests if the number of visits / duration of visits are different between treatment and subject across both years, and if there are any interactions. BlockNum refers to the experimental design being split into three randomised blocks (three blocks of plants growing in a greenhouse). I have tried a bunch of different models and cant seem to get a good resolution:
Repeated Measures ANOVA:
Code:
model1 <- aov(NumberOfVisits ~ SubjectType*Treatment*Year*BlockNum + Error(Subject/Year), data=dframe1)
Code:
trans_Y <- (dframe1$NumberOfVisits)^3
trans_Y <- (dframe1$NumberOfVisits)^(1/9)
trans_Y <- log(dframe1$NumberOfVisits)
trans_Y <- log(dframe1$NumberOfVisits+0.1)
trans_Y <- log(dframe1$NumberOfVisits+0.000001)
trans_Y <- log10(dframe1$NumberOfVisits)
trans_Y <- exp(dframe1$NumberOfVisits)
trans_Y <- abs(dframe1$NumberOfVisits)
trans_Y <- sin(dframe1$NumberOfVisits)
trans_Y <- asin(dframe1$NumberOfVisits)
Code:
library(lme4)
model1 <- glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = inverse), data = dframe1)
Code:
Warning message:
In glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1 | Year), :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
And so trying lmer:
model1 <- lmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = identity), data = dframe1)
Warning in lme4::lmer(formula = NumberOfVisits ~ SubjectType * Treatment * BlockNum + :
passing control as list is deprecated: please use lmerControl() instead
Error in (function (optimizer = "bobyqa", restart_edge = TRUE, boundary.tol = 1e-05, :
unused arguments (tolPwrss = 1e-07, compDev = TRUE, nAGQ0initStep = TRUE, checkControl = list(check.nobs.vs.rankZ = "ignore", check.nobs.vs.nlev = "stop", check.nlev.gtreq.5 = "ignore", check.nlev.gtr.1 = "stop", check.nobs.vs.nRE = "stop", check.rankX = "message+drop.cols", check.scaleX = "warning", check.formula.LHS = "stop", check.response.not.const = "stop"), checkConv = list(check.conv.grad = list(action = "warning", tol = 0.001, relTol = NULL), check.conv.singular = list(action = "ignore", tol = 1e-04),
check.conv.hess = list(action = "warning", tol = 1e-06)))
In addition: Warning messages:
1: In lmer(NumberOfVisits ~ SubjectType * Treatment * BlockNum + (1 | Year), :
calling lmer with 'family' is deprecated; please use glmer() instead
2: In lme4::glmer(formula = NumberOfVisits ~ SubjectType * Treatment * BlockNum + :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
Code:
model1 <- glmer(NumberOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = gaussian (link = log), data = dframe1) # random intercept
Error in eval(expr, envir, enclos) :
cannot find valid starting values: please specify some
Code:
model1 <- glmer(DurationOfVisits ~ SubjectType*Treatment*BlockNum + (1|Year), family = poisson, data = dframe1)
Code:
summary(model1):
Call:
lm(formula = DurationOfVisits ~ SubjectType*Treatment*BlockNum, data = dframe1)
Residuals:
Min 1Q Median 3Q Max
-8.3251 -3.9093 -0.5325 2.1748 18.9394
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9967 1.6053 2.490 0.0138 *
Treatment2 3.1750 2.1018 1.511 0.1328
Treatment3 0.1306 2.1018 0.062 0.9505
Treatment4 -0.7279 2.1018 -0.346 0.7295
...
I'm relatively new to R, so explanations in relatively simple terms would be appreciated!
Thanks a lot ahead of time!