Hello!
I am trying to predict which vehicles sell at a "quick auction" or not using vehicle characteristics such as age, make, actual cash value, etc. I would like to start with maybe a mixed logistic regression model. I have about 10,000 cases and for every 3 events there are 7 non-events (i.e. the rate that the vehicles sell at a quick auction is 30%). I think the general approach here is to over- or under-sample the training data. If I do this, will it change anything about how I interpret the model for inference? I think other options are to use a penalized model of some sort or include weights in some way? Thanks for the help
I am trying to predict which vehicles sell at a "quick auction" or not using vehicle characteristics such as age, make, actual cash value, etc. I would like to start with maybe a mixed logistic regression model. I have about 10,000 cases and for every 3 events there are 7 non-events (i.e. the rate that the vehicles sell at a quick auction is 30%). I think the general approach here is to over- or under-sample the training data. If I do this, will it change anything about how I interpret the model for inference? I think other options are to use a penalized model of some sort or include weights in some way? Thanks for the help