Logistic regression curve looks like linear regression

#1
Hi, I performed a linear as well as a logistic regression analysis (both with Matlab) and plotted those. Now the logistic regression curve is very linear and can be described by the same function as the linear regression. I explain the linearity by the fact that it is only a very small fraction of the whole logistic regression curve and far away from the asymptotic values. But can it actually be that they are described by the same function??

A confused beginner.
 

ondansetron

TS Contributor
#2
Hi, I performed a linear as well as a logistic regression analysis (both with Matlab) and plotted those. Now the logistic regression curve is very linear and can be described by the same function as the linear regression. I explain the linearity by the fact that it is only a very small fraction of the whole logistic regression curve and far away from the asymptotic values. But can it actually be that they are described by the same function??

A confused beginner.
Can you show us the plots?

What is being plotted on the Y axis in each case? for the logistic regression, the log odds should be roughly linear as a function of the x-variables, but if you're plotting the predicted probabilities, this would be sigmoidal (or a smoothed function of the observed 0 and 1 observations would be sigmoidal-ish).
 

Dason

Ambassador to the humans
#4
It's sigmoidal over the entire domain/range but if you just look at a subset it can be quite linear.

LogisticVsLinear2.png

If you just look at the plot for y between .25 and .75 a linear fit is pretty darn good. So if it's turns out that what you're modeling has predicted probabilities mainly in that region then a linear fit won't be too bad. Trying to use the linear fit if you plan on going beyond the input range you fit the model with could be problematic though.
 

hlsmith

Not a robit
#5
Yes @Dason - I assumed as well this is likely what he is referencing. Though, you never know without them explicitly posting them. Given your scenario - it can be interesting to see what subset of observations land in the tails. In particular, if they are using multiple regression models are they the subgroups positive or negative for most of the covariates of interest. In the use of propensity weights, trimming of extreme weights gets used - but once you trim you have to acknowledge your are using a different data sample conditional on some process.
 

Dason

Ambassador to the humans
#6
I'm dumping the code for the above graphic here since I don't care enough to save it to my system but I took the time to write it so might as well...
C-like:
# Create the data for a logistic curve
xs <- seq(-5, 5, by = .01)
ys <- plogis(xs)

# Let's do some plotting and save to a png
png("LogisticVsLinear.png")
# Create plot area with labels but no points
# basegraphics4life
plot(xs, 
     ys, 
     type = "n", 
     ylim = c(-.2,1.2), 
     main = "Logistic vs Linear - Midrange", 
     ylab = "y", 
     xlab = "x")
# Add in the logistic curve
lines(xs, ys, col = "blue")

# Plot the asymptotic boundaries
abline(h = 0)
abline(h = 1)

# you can define the area you want the line
# to 'best fit' for.  In this case it was
# for -1 <= x <= 1
id <- which(abs(xs) <= 1)
xs_red <- xs[id]
ys_red <- ys[id]
o <- lm(ys_red ~ xs_red)

# Plot the best fit line
abline(o, lty = 2)

# Add a legend.
legend("topleft", 
       c("Logistic", "Linear"), 
       col = c("blue", "black"), 
       lty = c(1,2))
dev.off()
 
#7
@Dason , that's what I thought too.
In both cases I am plotting performance (1=100%) on the y-axis, but for the logistic regression it's predicted probabilities.
ASK2.png ASK.png
The first plot is the linear, the second the logistic regression