GLM model for negative response variable

#1
I'm trying to fit a glm model with a response variable having negative values on it (Fig. 1). I'm using several predictor variables with different distribution (Fig.2, Fig.3, and Fig.4).

I have doubts about what family distribution and link function I should choose. I think my response variables seems to have a gamma distribution, but I think it's not correct (or even possible) to run the code with gamma family when you have negative values of your response variable. I'm using R software to deal with this model.

I'm pretty rookie with these types of models and I would be so greatfull if somebody could help me out with this situation.


Figure 1. Response variable (rates of change in basal area (m2/ -ha-1 /year-1))
Fig.1. Response variable.png

Figure 2. First predictor variable (Basal area (m2/ -ha-1))
Fig.2. Basal area.png
Figure 3. Second predictor variable (Drought intensity)
Fig.3.Mean drought intensity.png
Figure 4. Third predictor variable (Change in drought intensity(ºC))
Fig.4.Change in drought intesity.png
 

Buckeye

Active Member
#2
Maybe you could try an OLS model to see how it looks? The normality assumption is in reference to the residuals not the response. I think there are transformations to change the range of the dependent variable. I need to read further into this.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
You are correct, gamma is zero bounded. Post a histogram of the dependent variable. Maybe just a normal dist and identity link. Poisson is used with rates, but you seem to have differences.

P.S., Please don't create duplicate threads.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Also, if there are issues with predicted centrality of the DV, there is something called M-estimation. It is also a form of maximum likelihood estimation that can be robust to some traditional threats to linear regression.
 
#5
You are correct, gamma is zero bounded. Post a histogram of the dependent variable. Maybe just a normal dist and identity link. Poisson is used with rates, but you seem to have differences.

P.S., Please don't create duplicate threads.
Thank you for your answer. Sorry for the duplicate thread, I realized there is a regression forum and I think it would be better there.

The dependent variable is the Figure 1.

Thanks.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
another approach may be to use Poisson, but don't model the change, just model the second rate and control for the first rate in the model. Many times this is preferred because a rate difference of 10 could represent an infinite range of scenarios (e.g., 40-50; 2, 12;...).

@GretaGarbo any suggestions - you see fairly savvy with glm.
 

Buckeye

Active Member
#7
another approach may be to use Poisson, but don't model the change, just model the second rate and control for the first rate in the model. Many times this is preferred because a rate difference of 10 could represent an infinite range of scenarios (e.g., 40-50; 2, 12;...).

@GretaGarbo any suggestions - you see fairly savvy with glm.
Are you suggesting to use the first rate as an offset? I think there's an argument in the glm function for this.
 

hlsmith

Less is more. Stay pure. Stay poor.
#8
Well it depends on what the variables look like, rates can be a variety of things.

@JulianT can you provide a histogram for those variables too
 
#9
It seems like the response variable is a change variable ("rates of change in basal area (m2/ -ha-1 /year-1)")

It seem like doing a pairwise t-test, but in this case you have other explanatory factors too.

An other formulation of the pairwise test is to stack response variable (y) under each other, and create a block variable (field) that indicated the the same field and then you have other explanatory variables (x).

example:

response field treatment explanatory
y1 1 t1 x1
y2 1 t1 x2
y3 2 t2 x3
y4 2 t2 x4
 

katxt

Well-Known Member
#11
Maybe you could try an OLS model to see how it looks?
This is an excellent idea. You could start with response = change. What could influence change? Predictor = drought.
Plot change vs drought. Does it look straightish? does either need a transformation?
Regress change onto drought.
Do a normal plot of the residuals. Is the line fairly straight?
Plot residuals vs predicted values and see if there is any noticeable pattern. that may indicate a possible improvement.
Show us these plots and we can comment.
If the model is multiplicative, it may be worth trying again using %change rather than change.
It's too early for gammas and link functions (I think). The main advantage of a glm is that you can include categorical predictors like country.