Skewed Population distribution.

Locolindo

New Member
Hi there,
I have some question regarding samples and population distributions.

Does a skewed sample distribution of n ~50 reflect the same skewness distribution from the population it came from? Does it have the same shape (e.g. both skewed to the right).

When looking at the sample’s descriptive statistics, i can see that it has a skewness of 25, so can I infer that the population is also skewed in this manner? I cannot retest.

hlsmith

Not a robit
Here is the million dollar question, how big is the population? If it is N=51, well yeah probably you can. If it is N=1,000,000,000 - nah you probably can't.

spunky

Doesn't actually exist
Hi there,
I have some question regarding samples and population distributions.

Does a skewed sample distribution of n ~50 reflect the same skewness distribution from the population it came from? Does it have the same shape (e.g. both skewed to the right).

When looking at the sample’s descriptive statistics, i can see that it has a skewness of 25, so can I infer that the population is also skewed in this manner? I cannot retest.
Not necessarily. The sample skewness is a statistic that's subject to sample-to-sample variability. The best option would be to pose a confidence interval on you value and assume the population skewness is somewhere around there.

Locolindo

New Member
Here is the million dollar question, how big is the population? If it is N=51, well yeah probably you can. If it is N=1,000,000,000 - nah you probably can't.

hlsmith

Not a robit
So there is a super-population and you have a sample of the size 0.1%. The next question may be, how was the sample derived? Was it a random sample, convenience sample, etc.

Locolindo

New Member
Dear hlsmith
Thank you so much for your help.
I am pretty sure the sample was done in a random manner. Sorry, a lot of this research was done before my time with this institution.

hlsmith

Not a robit
So what you are looking out for is selection bias. When respondents have the choice to participate or not, their choice may be associated with their provided responses (I.e., only super satisfied or unsatisfied individual providing feedback). So if the sample was not random and unbiased, well then values may not reflect the super-population. Also, think of your sample size and the super-population size, with a small enough sample, with chance you may not get a representative signal. For example, a fair coin is a random variable with a 50-50 probability of heads or tails. Though, if a sample of flips is not large enough I can, by chance, get a long run of heads, making the coin look like its underlying function is not 50%, but higher. The larger the sample the closer it gets to converging to the truth. So if you have a small sample it may be hard to generalize to the population especially if it is not random.

Those are the issues you have to address when generalizing back to the population.

GretaGarbo

Human
Sorry I can't see any question about a superpopulation, so I don't understand that. In my view the population size does not matter. The usual model is that you draw a sample from an infinite normal population. You can weight your self an infinite number of times for example.

The OP believed that the sample was random and asked about the skewness. But was the estimated skewness really 25? For a normal distribution and a sample size of 50 I would not be surprised if the estimated skewness was -1 or +1. But 25 is very high.

But yes, from a random sample of 50 you can generalize to the population. But, as Spunky said, use a confidence interval.

hlsmith

Not a robit
Greta, what are you proposing for the calculation for confidence intervals? Bootstrapping of the 0.1% sample?

GretaGarbo

Human
Greta, what are you proposing for the calculation for confidence intervals? Bootstrapping of the 0.1% sample?
Yes, bootstapping is a good idea. The size of the population is irrelevant.

hlsmith

Not a robit
How about if it was "1". How is that irrelevant toward deriving accurate parameter estimates. Also, someone assuming the sample is random and it being random are two different things. I haven't seen many truly random samples in my career.

Dason

@hlsmith - I might be misunderstanding but are you trying to claim that the proportion of the population sampled is the important piece and raw sample size isn't what matters?

hlsmith

Not a robit
@Dason , I am just dorking around, but trying to bring attention to the vagueness of whether the sample is random or not and that the relative sample size (so yeah, the proportion) is small, so its generalizability back to the population could be questionable. Secondarily, we do not know context particulars that could also put into question the generalizability, such as the random variable of interest's data generating process may be complex or a mixture, say we are looking at human weight values and there are certain subgroups in the population that aren't represented at all or over-represented (e.g., genders, ages, SES, combinations of these). If I grab 0.1% of the people's weights from my hallwall at work and it is not completely at random, but voluntary, my estimate could be drastically off even if bootstrapping the CI.

Last edited:

GretaGarbo

Human
I just believe that the estimate of the skewness works in the same way as an estimated mean from an finite population.

There the usual formula for the variance in the mean is: s^2/n

But when you are sampling from people, a finite population, you can add a finite population term, so that the variance in the mean is:

(s^2/n)*((N-n)/(N-1)

Where s is the standard deviation, n is the sample size and N is the population size.

But when N is large in relation to n, then the finite populations term becomes close to 1. So the populations size does not matter. Here it is (40,000- 50)/(40,000-1). Also close to one.

Another example is how big sample do you need to investigate Iceland (300 000 people) and how many for USA (300 millions people). You need the same number! (since the finite populations term is close to one)

hlsmith

Not a robit
Thanks. I hadn't seen this finite sample term applied before. So if you have a marginally small sample then you accept the estimate otherwise you shrink just the variance term? How does this come into play with the standard CI calculation or use of bootstrap?

GretaGarbo

Human
Well, if your population of interest are the N=45 person in your office corridor and you want to know something about the mean of them, and you have interviewed 15 of them, then you have talked to 1/3 of the population, so you know a lot about the whole population.

So the standard error will be as before but the finite population correction (often called "fpc") will now be 2/3 so the standard error will be smaller than if the population had been large (say in several houses around you).

This would be the standard error for the mean. So then there would be no need for bootstrapping.

What is the standard error for skewness? I don't know. I guess that with bootstrapping and when there is a small population one could adjust with the fpc, but I don't know. But since the OP said that the population was about 40,000 so I didn't believe there was any need for an fpc.