How does p-values influence prediction result?

#1
I'm running logistic regression model in R and came out with some question.
Looking into the model details, all the variables' p-values are higher than 0.05, and their coefficients are negative.
I thought this model would fail predicting properly, but the prediction score F2 particulary was actually higher than 0.93.
And I'm just wondering how these outcomes(p-values and prediction result) relate to each other.
I'd like to know if the model is worth or not.
Thanks.
 
#4
You could have an overfitted model, small sample and a ton of predictors. Tell us more about the context, sample size, proportion with outcome, number of predictors, and how predictors are formatted!
 
#5
Thanks all for your advice! I guess what @hlsmith said quite got the point.
I have only small and imbalanced dataset - train data:561(1:429, 0:132) test data:241(1:176, 0:65), variables:183.

What do I do to create better (you know, acceptable at least) model under this condition, regardless of prediction score at this stage.

Thank you!!!
 
#6
@noetsi thanks a lot! I tried to check multicolinearity by vif() in R but it returns "Error in vif.default(fit) : there are aliased coefficients in the model".
How can I check its multicolinearity in this case??
And is there anything I could do to avoid multicolinearity?? - I've already been through step() of R package.
Thanks.
 
#7
Well you have already tried one approach, so your results are now conditional on that one failing your expectations. I am just mentioning this since given any purpose, fitting multiple models can contribute to false discovery.

You should run a least absolute shrinkage selection operator (LASSO) or other type of regularizing model.
 

noetsi

Fortran must die
#8
I don't know R. You could check tolerance a statistic that is very much like VIF. I wonder if something is really unusual about your data set, I can't understand why VIF would not run. You might ask about that on the R page here.

There are no really good solutions to Multicolinearity. The two best are 1) gather more data (which is probably not possible I am guessing) or 2) find an alternative to one of the variables in your model. One chance is to use something like factor analysis to collapse two or more of your variables into one. A second, if you have a theory is to find a variable that logically could take the place of one of your IV.

Personally I would just report the variables do a good job of explaining the model, but can not do so individually because of Multicolinearity.