Beta Regression Coefficients

#1
In a simple model, x is a continuous (normally distributed) variable predicting y. Since y values are proportions ranging from 0 to 1 (0%-100%), simple linear regression may give out-of-bounds estimates for some predicted values (i.e., lower than 1 or higher than 1).

Therefore, I have decided to use beta regression with boundaries from 0 to 1 (i used betareg() command in betareg R package; the software is however not important). While it is easy to interpret the unstandardized regression parameter from a linear model (see below linear model output: B = 0.126 indicating an increase by 12.6% of y if x rises by 1), I am not sure how to understand, transform, or use the parameters from betareg model to get a meaningful interpretation of the coef (see below - Beta regression output).


Output for linear regression model: lmMod = lm(formula = y ~ x)
Code:
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.57936    0.10849  -5.340 9.57e-07 ***
x        0.12591    0.01354   9.296 4.07e-14 ***
Output for beta regression model:betaMod = betareg(formula = y ~ x)
Code:
Coefficients (mean model with logit link):
            Estimate Std. Error z value Pr(>|z|)     
(Intercept) -4.85712    0.52580  -9.238   <2e-16 *** 
[B]x[/B]            0.56796    0.06498   8.740   <2e-16 ***

Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)     
(phi)    7.686      1.184   6.491 8.54e-11 ***
How can I interpret the parameter 0.567 in the beta regression output (together with the intercept)? Is there a way how to use 0.567 and get the increase of the absolute value in y (i.e., if x increases by 1, y increases by XX, since y is in %, the interpretation is easy).
Thank you! M.
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#2
I had a similar pursuit about 6 months ago. I believe I came across a SAS tech paper from a sas user group that gave a good description. Sorry I am not at my computer right now. Though I will see if I can find it.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
It might have been Paper: 335:2011. Looks like they take on a logistic style interpretation.
 
#4
Thank you a lot for helping,
logistic interpretation means B1 is log odds, right? So I can use exp(coefficientB1_value) to get "odds" ( = 1.792) which I don't understand at all.

Perhaps another way to go: I am considering to use the abovementioned simple linear regression and then define the "meaningful" range of its application (like, use linear regression equation to compute the value of x that would predict prob of y = 0 and then estimate upper-bound meaningful value of x that would predict y = 1). Does this make any sense? Not sure, but i really need to know an increase of X changes the value of Y (in %).

BTW, the relationship Y~X can be seen as linear:


Thank you,
 
#5
The logit model:

log(p/(1-p) = beta*x

can be solved to:

p = exp(beta*x)/(1+exp(beta*x))

or

p = 1/(1 + exp(-(beta*x)))

It gives these numbers:
Code:
# the linear regression model parameter estimates
a <-   -0.57936 
b <-    0.12591 

a + b*8
# [1] 0.42792
#seems reasonable

a + b*9
# [1] 0.55383 


# the beta-regression model with logit link: 
alpha <-  -4.85712 
beta  <-   0.56796 

# log(p/1-p) = xbeta gives

# p =  1/(1-exp(-(alpha + beta*x))) 

p0 =  1/(1+exp(-(alpha + beta*8))) 
p0
# [1] 0.4222753

p1 =  1/(1+exp(-(alpha + beta*9))) 
p1
# [1] 0.5632887

p1 - p0 
# [1] 0.1410134   changing from x=8 to x=9 

# compare with the above linear model
0.55383  -  0.42792
# 0.12591 

# they are two different models so they don't give exactly the same result
# but similar results
But if your original data were 0/1 success/failure then maybe it would be more natural to do the usual logit.
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Can you post a histogram of your dependent variable values? Linear reg is acceptable given the bulk of values land near 0.5 with minimum dispersion.
 
#8
Can you post a histogram of your dependent variable values? Linear reg is acceptable given the bulk of values land near 0.5 with minimum dispersion.
Sure,
just to mention that each data point represents a difficulty parameter of a test item which was estimated on ~200 individuals measure.
The issue of the linear/beta regression was to model of how theoretical complexity of an item (given by construction) relates to its empirical difficulty.

M

 
Last edited: