Assume original data contains 1000 goods and 1 bad

I build a logistic regression and use the the model to score the bad and I get probability = 0.00001

Then I use oversampling/undersampling to increase/decrease the original data so now I have 1000 goods and 1000 bags if I use oversampling.

Then I build a logistic model use the data and apply the model to the original data then for that bad I get probability = 0.5.

However this probability need to be adjusted to reflect original data so after doing some math you get adjusted probability lower than 0.5 (for example 0.00001 ) so what is the point of oversampling/undersampling if you are required to adjust the probability?

I build a logistic regression and use the the model to score the bad and I get probability = 0.00001

Then I use oversampling/undersampling to increase/decrease the original data so now I have 1000 goods and 1000 bags if I use oversampling.

Then I build a logistic model use the data and apply the model to the original data then for that bad I get probability = 0.5.

However this probability need to be adjusted to reflect original data so after doing some math you get adjusted probability lower than 0.5 (for example 0.00001 ) so what is the point of oversampling/undersampling if you are required to adjust the probability?

Last edited: