Linear regression vs logistic regression

Mukund

New Member
I have a time series dataset. The,

X (Independent variable) is time and is denoted as 1,2,3,4,5,6..1000.etc Y (Dependent variable ) is a percentage scale as 99%, 98.7%, 96%, 91% ...etc. This is a continuous data set.

I have 1000 such data points. The first 700 data points used as training set and rest 300 is used for testing.

I tried to use simple linear regression but when predicting sometimes the prediction is more than 100%. And the case is even worse when I calculated the confidence interval and prediction interval.

So I tried to use logistic regression as there is a boundary ( from 0% to 100%). But logistic regression can take only binary data. I am confused on how to appropriately convert my existing time series data so that I can try how logistic regression on that.

Last edited:

Dragan

Super Moderator
The easiest answer is to "censor" your data by converting the percentages (the dependent variable) from 70%-100% to scores of 1. For percentage points less than 70% convert those data points to scores of 0. As such, you could then use binary logistic regression.

noetsi

No cake for spunky
If you have time series data its questionable if either linear or logistic regression is ideal. Something like ARDL might be better.