Central limit theorem and sampling distribution - Doubts

#1
Studying statistics, it turns out that in the study of the trend of a variable in a sample, regardless of whether this comes from a population in which this variable follows a gaussian distribution, the sampling distribution will tend to a Gaussian for n->inf. This derives from the central limit theorem.

In other words, If I have correctly understood, if n is sufficiently large (there are various conventions in this sense, for example, n>30 or n>100) the sampling distribution will tend to a Gaussian (regardless of the features of the population distribution). But now my doubt is that I have always misunderstood the notion of sampling distribution. In practice, suppose I want to study the height distribution in English males and it doesn't follow a Gaussian distribution. With a sampling distribution with n>30, X≃ (mu, sigma).
But what is meant by sampling distribution, for example with n=30? I meant the sampling distribution like this:

For example, I have a sample of 3 boys with 3 associated heights: X1=175 cm, X2= 180 cm, X3= 190 cm.
The sampling distribution then contains all the possible means in this sample: X1+X2+X3/3, X1+X2/2, X1+X3/2, X2+x3/2.


Therefore, I had understood that, If I take a sample of at least 30 people, the distribution that contains all the possible averages within this sample, will tend to a normal distribution, despite the population distribution of English males is not symmetrical (to mention the previous example).
On the other hand, from other online examples, I had interpreted that n>30 conventionally required to apply the central limit theorem as "a sample containing at least 30 means of the values of as many samples". So, if I have a sample with a sufficiently large number of n means from n samples, I can approximate the distribution to a normal one.
Would anyone be able to clear my head about n, the sampling distribution and the central limit theorem in general?
 

obh

Active Member
#2
Hi,


You take a sample of 30 people and their answer for a specific question. 1000 times:
sample1 : [2, 7,5.....,1] , average(sample1)= 2.1
sample2 : [6, 1, 2.....,3] average(sample2)= 2.8
sample3 : [3, 6.,4.....,2] , average(sample3)= 1.9
.........................................
sample1000 : [4, 5.,1.....,2] , averae (sample1000)= 2.2

The average will distribute similar to normal, say if you will draw a histogram for the 1000 values: (2.1, 2.8, 1.9,...,2.2 ) it will be similar to the normal histogram
 
Last edited:
#3
Hi,

You take a sample of 30 people 1000 times:
sample1 : [175cm, 181cm,......,167cm] , average(sample1)= 176.2
sample2 : [175cm, 181cm,......,167cm] , average(sample2)= 177.9
sample3 : [175cm, 181cm,......,167cm] , average(sample3)= 175.1
.........................................
sample1000 : [175cm, 181cm,......,167cm] , averae (sample1000)= 176.2

The average will distribute similar to normal, say if you will draw a histogram for the 1000 values: (176.2, 177.9, 175.1,....,176.2) it will be similar to the normal histogram
Ok, so if I understood correctly, sampling distribution is a distribution composed of the averages of some samples (from the same population), with their associated probability (in the example there are 1000 averages 1000 samples of 30 people). And so, now I'm thinking: What does the theorem mean by referring to n->inf? Is it only referred to the size of the sample taken 1000 times (in this example, the sample of people's height)? So the indication given by the theorem about n->Inf is about the sample size (30 in the example) but not about the number of times the sampling is repeated (1000 in the example)? So, is this "n" referred to the sample size or the number of time the sample is taken (so the number of samples)?
In other words, to practically benefit from the central limit theorem, do I need to focus on the size of the sample (30) I take several times or on the number of times I take it (1000)?
 
Last edited:

obh

Active Member
#4
What does the theorem mean by referring to n->inf? Is it only referred to the size of the sample taken 1000 times (in this example, the sample of people's height)? So the indication given by the theorem about n->Inf is about the sample size (30 in the example) but not about the number of times the sampling is repeated (1000 in the example)? So, is this "n" referred to the sample size or the number of time the sample is taken (so the number of samples)?
In other words, to practically benefit from the central limit theorem, do I need to focus on the size of the sample (30) I take several times or on the number of times I take it (1000)?
Hi HS,

My example was not so good as height distribute normally from the beginning, so I will change it to something else.
If you choose 4 values from a normal distribution the histogram won't be looked like the bell... so I chose 1000 for a better bell. shape

n is the sample size. the number of times the sampling is repeated is similar to sampling the normal distribution.
When you take one sample with a large enough sample size the average's distribution is similar to the normal distribution.
It is like taking one value from a normal distribution, but with the small standard deviation of the average.

PS it is not always correct, for example you need to have a finite variance for the original distribution.