Hi
I am having trouble understanding census populations, variability in census populations, and how and why one is allowed to attribute significance to tests conducted on data derived from census populations vs population samples.
1) Is there any point at which missing data points invalidate a data set as a "census" and render the data set a "sample". For example 3 morphological measurements were collected on a finite population of animals. The data is historical and animals with incomplete data sets (missing one or more measurements) were excluded from the analysis. Is there a percentage of missing data sets that would render this population a sample.
2) Is variance completely irrelevant to census data? In other words, does the spread in measurement data for a given morphological feature (for instance height) have any effect on the significance of the difference between means where animals are grouped into categories, for example, grouped by the year the animal was measured.
I understand that sampling error cannot be used to evaluate statistical significance due to the fact that there is theoretically no sampling error in a census due to sampling every member of the population, but does the fact that data is derived from a census also eliminate degree of variance as a way to compare the significance of a difference between two means? In other words no matter how much variability there is within the data sets being compared, is it the case that the variability can never invalidate the difference between the means?
In practical terms, when working with census data, is it valid to create graphs that show means with their standard deviation and make a statement about whether or not the measurement means from measurements taken in certain year ranges (ie 1981-1988, 1989-1992) are significantly different.
Having a hard to time wrapping my brain around how variability in the data due to genetic or environmental causes can be ignored even when you have data on the total population.
3) At what degree of distance from raw census data are you allowed to attribute statistical significance to the results. In other words, once you start performing tests on raw data (generate statistical data based on that raw data), is it then valid to examine statistical significance?
4) Is a census population statistically the same as a finite population and do the same statistical rules apply to both?
Thanks in advance...and sorry for the basic nature of these questions.
I am having trouble understanding census populations, variability in census populations, and how and why one is allowed to attribute significance to tests conducted on data derived from census populations vs population samples.
1) Is there any point at which missing data points invalidate a data set as a "census" and render the data set a "sample". For example 3 morphological measurements were collected on a finite population of animals. The data is historical and animals with incomplete data sets (missing one or more measurements) were excluded from the analysis. Is there a percentage of missing data sets that would render this population a sample.
2) Is variance completely irrelevant to census data? In other words, does the spread in measurement data for a given morphological feature (for instance height) have any effect on the significance of the difference between means where animals are grouped into categories, for example, grouped by the year the animal was measured.
I understand that sampling error cannot be used to evaluate statistical significance due to the fact that there is theoretically no sampling error in a census due to sampling every member of the population, but does the fact that data is derived from a census also eliminate degree of variance as a way to compare the significance of a difference between two means? In other words no matter how much variability there is within the data sets being compared, is it the case that the variability can never invalidate the difference between the means?
In practical terms, when working with census data, is it valid to create graphs that show means with their standard deviation and make a statement about whether or not the measurement means from measurements taken in certain year ranges (ie 1981-1988, 1989-1992) are significantly different.
Having a hard to time wrapping my brain around how variability in the data due to genetic or environmental causes can be ignored even when you have data on the total population.
3) At what degree of distance from raw census data are you allowed to attribute statistical significance to the results. In other words, once you start performing tests on raw data (generate statistical data based on that raw data), is it then valid to examine statistical significance?
4) Is a census population statistically the same as a finite population and do the same statistical rules apply to both?
Thanks in advance...and sorry for the basic nature of these questions.