If it matters I have effectively the entire population of interest.

- Thread starter noetsi
- Start date

If it matters I have effectively the entire population of interest.

Do you simply ignore slopes that the p value says are not significant regardless of relative effect size. The fact that I have the whole population and know that the effect size is real make it even more confusing to me that the p value says the larger effect is not significant and the smaller effect is significant.

You seem not to be mentioning the SEs. You can have a big slope by a lot of variability, which plays out as doubt in the estimate. You should select variables based on what you are trying to do and the context. I may opt to control for smoking status even though it isn't significant. I shouldn't have candidate variates in the model that I don't care about or that don't have contextual meaning in the first place. So don't put anything in that you don't care about, and only remove if you have contextual reason.

I am sure you are right about the standard errors. The se for one of the variables, the one that had a smaller slope but was significant had a SE a hundred times smaller than the variable that was not significant but had a higher slope.

I guess the real question is if I should use p values since this is effectively a population.

Do you simply ignore slopes that the p value says are not significant regardless of relative effect size. The fact that I have the whole population and know that the effect size is real make it even more confusing to me that the p value says the larger effect is not significant and the smaller effect is significant.

I think this may help give you some more insight on p-values and avoid conclusions like "...one model is right and a second model wrong..." based on p-values.

"

However, when building a regression model, the p-values are used to determine whether or not a given predictor variable has a significant impact on the outcome that is being measured. In this context, the p-values show whether or not there is any mathematically calculated reason to keep each predictor variable in the final regression model.

Building regression model is not the same subject as ‘drawing inference from sample mean versus populationmean’. "

"

However, when building a regression model, the p-values are used to determine whether or not a given predictor variable has a significant impact on the outcome that is being measured. In this context, the p-values show whether or not there is any mathematically calculated reason to keep each predictor variable in the final regression model.

Building regression model is not the same subject as ‘drawing inference from sample mean versus populationmean’. "

1) "significant impact" is a nonsensical statement as "significance" is an arbitrary dichotomization of the outcome for a particular statistical significance calculation and is

2) p-values are more accurately described, but still imprecisely, as a continuous, summary statistic of how different the observed data are from the expectations of a particular assumption (null hypothesis); not really a "mathematical reason" to keep variables until you apply some subjective criterion to the p-value to make a decision (and the p-value need not be part of the decision at all)

3) Building a regression model really takes two general purposes: prediction or inference (you could loosely say description if you want, but generally those two former categories) and I think the prediction and inference objectives can often overlap and blur, but are frequently different

4) if colleague did say "drawing inference from sample mean versus population mean" this doesn't really make sense because inferences are always drawn from the sample values otherwise it's not a matter of inference.

5) if you want good predictions from the model this can be very different from if you want to examine relationships and make inferences

Interested to hear what people can offer as criticisms on the accuracy of my points (even if some are picky).

So the p values directly contradict what the population effect size show.

That's all true. *If* you don't consider any other observations to be of any interest (which includes possible future observations). Want to consider whether the effect might hold at any point in the future? Sorry - you don't have the complete population any more.

If you mean that the population might change in the future that is true, but then I think that population does not actually exist at this point in time. I am not sure of the logic of saying our population is a subsample of a population that does not actually exist.