# Regression analysis with a proportion for the dependent variable

#### Imicola

##### New Member
Hi,

i want to run a regression analysis for some data I have. The dependent variable is a proportion, and the independent variables are all continuous.

At first I used linear regression, but then i though perhaps this is not right, as it would result in predictions of over 1 for my dependent variable in some cases which is non-sensical.

Is it ok to use linear regression in this case, or is there a different type of multivariate regression analysis to be used for proportion/percentage data?

Sorry if this is a stupid question - it has been a while since I did any regression analysis and i haven't been able to find a definitive answer in any of my text books or online.

Thanks,
Nicola

#### babucher

##### New Member
Someone else with more insight should probably comment here, but remember that it's perfectly fine (and often highly recommended) to only use the model to look at the region that is interpolated. That is, the model might be adequate to represent the region that you have data in, but outside the range of your data, the model is not applicable.

#### james

##### New Member
Hi,

i want to run a regression analysis for some data I have. The dependent variable is a proportion, and the independent variables are all continuous.

At first I used linear regression, but then i though perhaps this is not right, as it would result in predictions of over 1 for my dependent variable in some cases which is non-sensical.

Is it ok to use linear regression in this case, or is there a different type of multivariate regression analysis to be used for proportion/percentage data?

Sorry if this is a stupid question - it has been a while since I did any regression analysis and i haven't been able to find a definitive answer in any of my text books or online.

Thanks,
Nicola
If the values should only range between 0 and 1, perhaps give logistic regression (logit) a shot? Not sure that I fully understand the question, but it might be applicable.

#### wcs

##### New Member
As james mentioned, I think the best think for such an analysis is logistic regression.
(alternatively, you can convert the proportions to logits / empirical logits and do a linear analysis).

#### TheAnalysisFactor

##### New Member
I want to confirm that logistic regression is the appropriate analysis. This is actually one of the reasons for using logistic regression instead of linear.

One caveat, though. If all of your proportions are between .2 and .8, you can just run linear regression. This is hard to explain without a drawing, but when you plot a continuous predictor against a proportion response variable, the shape is sigmoidal--a flattened S shape. It gets flat at both ends, near 0 and 1, which is why the problem you described exists. But in the middle, it's actually quite linear. So while a linear regression isn't theoretically "correct," as long as you're in the middle of the graph, it will model the data quite well.

Karen

#### wcs

##### New Member
I've seen .3 and .7 in a paper for the bounds where linear regression works fine.
I wonder if it's somehow down to how "strict" one is about how close the straight line is to the logistic?
Maybe it depends on the particular data being analysed?

#### Mahi

##### New Member
I suggest you to use the Probit Analysis - which meant to measure the proportions.
Where as Logistic regression is used to model a dichotomous dependent variable.

Hope this helps.

#### TheAnalysisFactor

##### New Member
I've seen .3 and .7 in a paper for the bounds where linear regression works fine.
I wonder if it's somehow down to how "strict" one is about how close the straight line is to the logistic?
Maybe it depends on the particular data being analysed?
I think it depends on the strength of the "slope." A steeper sigmoid curve will stay linear longer, I believe.

#### jamesmartinn

##### Member
I agree with Mahi, I think Probit Analysis should be used here.

#### P J L

##### New Member
Hi,

I know this is an old thread, but for the benefit of anyone searching on this topic, I thought I'd add that if you look up "Probit analysis" in Dobson & Barnett's book "An Introduction to Generalized Linear Models", you find a description of that method, followed by the statement: "Another model that gives numerical results very much like those from the probit model, but which computationally is somewhat easier, is the logistic or logit model."

So it seems that logistic regression can be used to analyse proportions. However, note that whichever link function is used, you need to include both the numerator and denominator in the analysis, rather than just the proportions. This can be done either directly, or by using the number of cases as weights, as suggested here: https://stat.ethz.ch/pipermail/r-help/2001-February/011070.html