Logistic regression - different cutpoints for classification and probability?

#1
I was reading an article on logistic regression tonight, and I noticed the following:

"Logistic regression can be used to classify observations as events or nonevents
as was done in discriminant - classification analysis. ... To use this information you would search through the classification table to find the probability cut-off point that produces the best classification performance. In our example a probability of .22 to .28 produces rules that have the highest overall successful classification rate."

Are they saying that it could possibly be appropriate to say: your probability of having cancer is 25% by our model; however, based on our cutpoint, we believe you have cancer.

I'm confused, because I was of the opinion that you should be consistent with your classifier - that is, if you change your cutpoint to like 25%, then you should rescale your probabilities around this point.
 

vinux

Dark Knight
#2
I'm confused, because I was of the opinion that you should be consistent with your classifier - that is, if you change your cutpoint to like 25%, then you should rescale your probabilities around this point.
Cut off point is depending on the event rate (event %).
Around this point produces rules that have the highest overall successful classification rate.
 
#3
Cut off point is depending on the event rate (event %).
Around this point produces rules that have the highest overall successful classification rate.
Now, what you say makes perfect sense to me. But it seems that you have to make a mutually exclusive trade-off: either set your cutpoint so that your relative odds work out appropriately, or set your cutpoint to maximize your classification rate.

By maximizing your "relative odds", I mean that you could optimize for the "correctness" of the odds. So the best model would be one in which 1 out of every 10 packets that are given a 10% chance of being 'signal' are truly signal; 2 out of 10 packets that are given an 80% chance of being 'signal' are truly not signal; etc.

But it seems that by doing that, you don't simultaneously optimize your cutpoint for maximal classification. I mean, empirically, I have demonstrated this to myself with dozens of models fit around my data. I can either have excellent binary classification, or good "meaning" of the prediction (again, where 20% really correlates with 2 in 10 being misclassified as noise when they are indeed signal), but not both.