Hi all, this is my first post here so please let me know if I need to provide more details.

I want to generate a multivariate logistic regression model and use that to predict future outcomes. The logistic model gives me point estimates, but I also want prediction intervals. I have only been able to find explanations of how to calculate prediction intervals from univariate models, but I have a multivariate model and do not know how to generalize. (I know that there are functions in, say, R and other packages to automatically compute prediction intervals, but I need to be able to do it -- and understand it -- without relying on these.)

The specific model doesn't matter, but for sake of discussion we can just say that the model coefficients are:

Intercept = -30

Time since company was founded, in days = 0.008

Daily revenue, in thousands of dollars = 0.001

Size of market, in billions of dollars = 0.01

The point estimate is trivial to calculate from this:

The prediction intervals are tripping me up, though. For one variable, I understand the formula to be similar to (though perhaps not exactly) along the lines of the following R code:

I'm writing in R because I'm hoping I can convey myself more clearly this way, but let me know if another notation is preferred. Also, I'm not even sure that my example is correct for the one variable situation (for one, I left out the uncertainty in the intercept term!).

Most importantly, I have no idea how to generalize this to multiple independent variables... it seems that this approach only works with 1 independent variable. I'd greatly appreciate any help, suggestions, pointers, anything. Thanks.

I want to generate a multivariate logistic regression model and use that to predict future outcomes. The logistic model gives me point estimates, but I also want prediction intervals. I have only been able to find explanations of how to calculate prediction intervals from univariate models, but I have a multivariate model and do not know how to generalize. (I know that there are functions in, say, R and other packages to automatically compute prediction intervals, but I need to be able to do it -- and understand it -- without relying on these.)

The specific model doesn't matter, but for sake of discussion we can just say that the model coefficients are:

Intercept = -30

Time since company was founded, in days = 0.008

Daily revenue, in thousands of dollars = 0.001

Size of market, in billions of dollars = 0.01

The point estimate is trivial to calculate from this:

Code:

`odds = exp(-30 + 0.008*time + 0.001*revenue + 0.01*market)`

Code:

```
x <- c(1,4,5,9,13,11,23,23,28)
y <- c(64,71,54,81,93,76,77,95,109)
df <- data.frame(x,y)
model <- lm(y~x,data=df)
df$pred <- predict(model,df) #The predicted odds are stored here
df$sigma <- sqrt( (1/(length(df$x)-2)) * sum((df$y-df$pred)^2) )
df$sePredX <- df$sigma * sqrt( 1 + 1/length(df$x) + (df$x - mean(df$x))^2/sum((df$x - mean(df$x))^2) )
df$pred05 <- df$pred - 1.97*df$sePredX #These are what I calculate as being the upper 2.5%ile of odds
df$pred95 <- df$pred + 1.97*df$sePredX #These are what I calculate as being the upper 2.5% of odds
```

Most importantly, I have no idea how to generalize this to multiple independent variables... it seems that this approach only works with 1 independent variable. I'd greatly appreciate any help, suggestions, pointers, anything. Thanks.

Last edited: