Hello, this is my first post. I currently I'm working in my phD in a multivariate logistic model but I have a problem regarding the sample size of my observations:

-The "success" (1) event group has a sample size of 249 distinct observations

- The "non success" (0) event group has a sample size of 48,957, and it's a significant part of the population, and many times larger than the "success" group.

So when I fit a multivariate logistic regression model, two things happen:

- The p-values of the model coefficients become always significant at alpha 1%

- The fitted predicted probability variation between the groups becomes very tiny, even if the independent variables are good predictors.

So I was suggested making a bootstrap of the model. So here is my doubt :

-Should I do it the more usual way, taking smaller arbitrary size random samples of the complete sample (both groups) with replacement ( or without replacement in this case?)

OR

-Should I keep the the small "success" group constant and them add an equal number of different "non success" randomly picked cases in each sample.

In both cases this is not exactly the usual bootstrap as I wish to make sub-samples of a larger group, the original big sample, and perhaps what I want is not bootstrap at all. My idea is to minimize the discrepancy between the two groups sizes. Is this teoretically correct?

The final objective is to obtain an "avarage" model with the mean coefficients and use it to calculate the propability of the "non success" cases actualy being "successes"

Or does anybody has other idea for this question?

-The "success" (1) event group has a sample size of 249 distinct observations

- The "non success" (0) event group has a sample size of 48,957, and it's a significant part of the population, and many times larger than the "success" group.

So when I fit a multivariate logistic regression model, two things happen:

- The p-values of the model coefficients become always significant at alpha 1%

- The fitted predicted probability variation between the groups becomes very tiny, even if the independent variables are good predictors.

So I was suggested making a bootstrap of the model. So here is my doubt :

-Should I do it the more usual way, taking smaller arbitrary size random samples of the complete sample (both groups) with replacement ( or without replacement in this case?)

OR

-Should I keep the the small "success" group constant and them add an equal number of different "non success" randomly picked cases in each sample.

In both cases this is not exactly the usual bootstrap as I wish to make sub-samples of a larger group, the original big sample, and perhaps what I want is not bootstrap at all. My idea is to minimize the discrepancy between the two groups sizes. Is this teoretically correct?

The final objective is to obtain an "avarage" model with the mean coefficients and use it to calculate the propability of the "non success" cases actualy being "successes"

Or does anybody has other idea for this question?

Last edited: