Thanks for reading this post! The core of my question has to do with how to correctly code repeated measures when the repeated effect (here "sampling", could easily be "time") isn't the same for all of the subjects. In context:

I have sampled 15 houses 3-8 times each for presence or absence of a bacterium. Within each "house" there are "real_locs" (specific physical real locations) Each of these physical locations that was sampled has a unique identifier. Real_loc is the subject upon which repeated measures were taken. Each real_loc falls into an "environment" category. Samplings happened in different "seasons". I believe "sampling" is the "repeated effects" predictor variable. -- Here's what confuses me: Each house was sampled up to 8 times approx 3 months apart, but the sampling does not begin at the same time for all houses; it is scattered across 2 years of start dates. So Sampling 1 does not correspond to the same time for different houses, but all the real_locs in a given house were sampled at the same time for Sampling 1 for that house (and so on for Sampling2, etc).

My questions that I want to incorporate in my model are:

a&b) Are there differences among types of environments and seasons in the probability of recovering our bacterium? (should be fixed effects)

c&d) Is there significant variation among houses in rates of recovery?, and Does recovery among environments vary across houses? (I think G-side random factors)

e) Is the probability of recovering the bacterium at a later sampling correlated with whether it was recovered there in the past? (in other words, not only do I know I need to accommodate the repeated measures in the model, I am actually interested in whether there is a "significant effect of repeated sampling in a location").

My code (based on my familiarity with PROC MIXED, the Users Guide to GLIMMIX, especially the pages on Repeated Measures, and a helpful exchange with SAS Tech Support which thank goodness, has resulted in each run of the model (the data set is both large and very imbalanced) now taking 10-30 min to run, instead of hours) is:

proc glimmis data=mydata;

class house sampling environment real_loc season;

model Recovery=environment season sampling / dist=binary link=logit ddfm=residual;

random int environment / subject=house;

random sampling / subject=real_loc type=AR(1) residual;

covtest=wald;

nloptions tech=nrridg maxiter=250;

run;

This code sort of makes sense, but I remain worried that I have something wrong with the repeated measures. I would have thought that sampling was a random factor, not fixed and/or that it would be nested within house, but all the sources I have consulted indicate the repeated effect is included as a fixed effect and then in the random statement in which you code the repeated measures. So my questions are:

1) Does this code correspond to my questions?

2) If so, what is the significant F-test of the fixed effect "sampling" telling me?

3) Is it correct to interpret the signif CovParm "AR(1)" with subject "real_loc" as telling me that there is a significantly greater likelihood of finding the bacterium again if you found it somewhere once?

Thank you for your help and your time,

Susi