thanks for your reply. But there is only 1 person goes into bad. Not 10. The original data has proportion 1000:1 so 0.1%. I still don’t understand why duplicate this record would help?
Assume original data contains 1000 goods and 1 bad
I build a logistic regression and use the the model to score the bad and I get probability = 0.00001
Then I use oversampling/undersampling to increase/decrease the original data so now I have 1000 goods and 1000 bags if I use oversampling.
Then...