Intepretation of regression estimates based on data from census

#1
I'm a bit curious about how to interpret confidence intervals of parameter estimates that comes from regression models (multiple regression and logistic regression) based on census data. Since there should'nt be any sampling errors, the confidence intervals shouldn't be necessary to interpret at all? Rather be considered as "true" parameter values?

If I for example want to study a phenomenon in a county and manage to collect data from all the county citizens and only want to make conclusions about the state in that particular county.

This is of course only valid when one wants to make conclusions/inference about the population from which the census comes from.
 

jpkelley

TS Contributor
#2
Hi kabooze,
Could you clarify how you're using regression models? You mention that the regression is based on census data, but it isn't clear what you need confidence intervals from.

I'm assuming you have a feature of the entire population (e.g Democrat or Republican) and that you're looking at what factors (e.g. household income, parents' ages, etc.) influence --or are associated with-- inclusion in one party or the other. In this case, though you have complete population coverage, there's still error in the estimate of the parameter estimate for household income, etc.

Anyway, tell us more about your data structure, and I'm sure we will all weigh in a bit more.
 
#3
jpkelly

Actually it's not any data that I have or is going to analyze. I started to think about this when I read a report which included confidence intervals of means and also those kind of regression models that you mentioned above.

In the first case I think it's obvious that it's a mistake to calculate conficende interval for means, since the dataset is not from a sample, nor from a whole population, right?

So, the same interpretation should be made when regression models are estimated from data that is not a sample?
 

spunky

Doesn't actually exist
#4
So, the same interpretation should be made when regression models are estimated from data that is not a sample?
mmmhh... not necessarily so. sampling error is not the only error you deal with when doing statisical modeling. if you go back to the way in which the multiple regression model is defined in any textbook, you'll see somethign along the lines of Y= Bo+B1X1+B2X2+...+BpXp+ e. that e will always be there because you can have error of measurement, for instance or your participants will also change with time and hence you cannot predict their scores accurately or a bazillion other reasons. anyways. just keep in mind that just because you dont have sampling error, that does not mean you're not subject to all kinds of other errors our there...
 
#5
No, of course there will be some measurement errors and of course there will be some variation in the DV that is not "caught" by the model which the error term indicates. But confidence intervals are just based on the point estimates and their standard errors - and only estimates the sampling error (which is absent when the data isn't a sample), right?
 

spunky

Doesn't actually exist
#6
i guess i was focusing on the standard error of prediction rather than the standard error of the parameters themselves... you're right on that one. uhm... my guess would be that most census (although called "census") do not really capture the population under consideration. most of the time they're doing some pretty advanced sampling weights to the obtained data (think about it... do you think anyone could get ALL of the U.S.? and, even if they did, would all the data be entered correctly? as long as there is no systematic error, error of measurment will go to the standard error of the estimate)
 

bryangoodrich

Probably A Mammal
#7
No, of course there will be some measurement errors and of course there will be some variation in the DV that is not "caught" by the model which the error term indicates. But confidence intervals are just based on the point estimates and their standard errors - and only estimates the sampling error (which is absent when the data isn't a sample), right?
A confidence interval can be calculated whenever there is a statistic that is being estimated. Of course, we wouldn't be dealing with a statistic if we were simply enumerating some figures from the population. That is not the case with the census. For one, Census data is not complete. It is also generated for households. From that, we infer certain statistics about the population that reflect individual characteristics (e.g., the percent Hispanic in the U.S.).
 
#8
Maybe census is wrong word to use (english is not my native language), but I meant when the data isn't a sample but collected from the whole population that you want to make inference about. If we ignore other sources of errors, you should'nt get any sampling errors then so the parameter estimates would be due to the model (which of course have an error term) the "true" ones for the population?? Like you get the true parameter of for instance the mean height in a population when you have data of heights for everone in that population.

edit: In that case the confidence interval is zero because one should add an finite correction (1-(n/N)).
 
Last edited:

spunky

Doesn't actually exist
#9
so... upon re-thinking my answer a little bit and going through my articles i found one that pertains to this situation in particular: Brunner, J. and Austin, P.C. (2009). Inflation of Type I Error Eate in Multiple Regression when Independent Variables are Measured with Error. in The Canadian Journal of Statistics. there you can see that error of measurement in the independent variables (not sampling error) gets taken up in the standard error of measurement, which tells you that even if you have every single one of the members of the population, your statistics are still estimates of the parameters that could only be found if you were to measure without error.

now... if you assume you have everyone in the population... and then you also assume that you measured everyone without error... and then you further assume that you managed to remove any kind of stochasticity inherent in the estiamtion of your parameters... then yes, in that case your estimates should be the same as the parameters of the population.
 

bryangoodrich

Probably A Mammal
#10
kabooze, if your sample just is the population of interest, then any figures describing the data will just be the population numbers. There would be no confidence intervals because, barring measurement error, there is no estimation or random error. A sample estimator is something we use to infer something about a population to which we think the sample represents. But if there is no difference between the representation and the actual population, then we aren't making an inference. We're simply describing the population. For instance, if we have a sample S of a population P, we can use the sample mean ES to estimate the population mean EP. In doing this, our point estimator will have error, and we set with a certain precision (confidence) the interval we think EP will reside in based on ES: i.e., with a certain confidence level we will expect EP to fall within a radius CI of ES. But if we're literally calculating EP itself, so that ES = EP, what CI is needed? None! We're not estimating, we're describing EP itself because S = P.