0-3 bounded continuous dependent variable

#1
Hello all,

I am working on a difficult (for my abilities) dataset. The dependent variable is continuous [0-3] bounded and was measured between 2000 to 2019 in several locations (not necessarily the same location in each year). So I have a spatial component that would like to account for.

My independent variables are year (as categorical and year_lin as continuous) and catvar (as categorical).

It is an observational study, there is no replication/randomization.
The goal is to examine how Y changed the past 2 decades and control for catvar across the entire region.
So I am trying different variations (different distributions and covariance structures) of this mixed model:

proc glimmix data = data plots=all;
class state year catvar;
model Y= year_lin|catvar/dist=lognormal ddfm=satterth solution;
random intercept/subject=state*year type=vc ;
random intercept/subject=state*year type=sp(sph)(long lat) residual;
run;

The second random statement (R-side) never worked (cannot find good starting values).

I have also tried transformation to bound Y between 0-1 and use beta dist. No matter what I do, there are always issues with the residuals that I cannot solve.
The dataset is not very large (~650 datapoints).

Any ideas of what might work?

Thanks
 

fed2

Active Member
#2
proc glimix is a royal ass-pain. if ur going lognormal why not just log x-form and head on over to proc mixed? what about the 0's in [0, 3]? Can't take the logs of those can you? GEE is another option, and probably going to be alot easier. Doesn't glimix go to 'GEE mode' under some covariance structures.

Good luck!
 
#4
Thank you both for the replies. There are only a handful of zeros. I can add +0.0001 to every value so I won't loose any datapoints when using log transformation. I was using glimmix to fit a beta distribution when I converted everything to (0-1) range. I will try proc mixed too.

The variable is continuous, there are values such as 0.34, 1.27, 2.94 etc. so I don't think ordered logistic regression is a valid approach.

Thanks
 
#6
Hello again,

I am still working on the same data and I have a question about the residual plots. Everything looks good (histogram, QQ-plot and box-plot) apart from the fitted vs residual plot (top left). I have tried many different models and I keep seeing this tilted rectangle.


This model works best so far.

Code:
proc glimmix data=data plots=all plots=boxplot;
class A B location year;
model Y=A|B|year_lin/ddfm=kr2 solution dist=n;
random location/subject=year;
random residual/subject=year type=ar(1);
run;
I include year as fixed cont. variable (year_linear) and year (as categorical) in the random residual statement to account for any R side correlated errors.
I have tried to account for possible spatial R side correlated errors using the coordinates of each location but that resulted in zero covariance parameter estimates.

I believe there is something I do not account for in the model.

Any thoughts on this?
Thanks
 

Attachments