Hi folks,
The disclaimer first: I'm not primarily trained in statistics, so this problem might sound very naive to the experienced statisticians. Your expertise is what I'm looking for and any help is highly appreciated.
Now the problem:
I am supposed to calculate total chocolate consumption in a population. Further, I need to calculate per capita chocolate consumption in the overall population, per capita chocolate consumption in rural and urban areas of the population, per capita chocolate consumption by age groups and by income groups.
A primary survey was conducted in a representative random sample of the population wherein annual chocolate consumption data was collected. The sample was created using a two-tier stratification. Tier I was by geographical region and Tier II was by rural and urban within each geographical region. This is illustrated as follows:
Sample size: 1200 persons
Number of regions: 4
Sample size per region:300
Within each region the sample was further subdivided into rural and urban based on the population mix of that region.
Once the survey data was received, aggregations were done as follows:
Total consumption (T) = sum of consumption in each region (T1 + T2 + T3 +T4)
Consumption in a region = consumption in region's urban stratum + consumption in region's rural stratum [eg. T1 = Tu1 + Tr1, etc]
Consumption in the region's urban/rural stratum = (Consumption in the specific stratum / sample size in the specific stratum) * Population of the specific stratum
[eg. Tu1 = Cu1/Su1*Pu1 ; Tr1 = Cr1/Sr1*Pr1]
Now that we have the total consumption, calculating overall per capita consumption is fairly simple... Y = T/P
Per capita consumption by rural/urban stratum is calculated as:
Yu = (Tu1 + Tu2 + Tu3 +Tu4) / (Pu1 + Pu2 + Pu3 + Pu4)
Yr = (Tr1 + Tr2 + Tr3 + Tr4) / (Pr1 + Pr2 + Pr3 + Pr4)
The problem is, when I try to calculate per capita consumption by income group and by age group, I don't have data on distribution of population by income and age. Hence I resort to the crude method of using unweighted sum of consumption in the sample for these calculations. This is leading to problems, eg. if consumption of chocolates is far higher in one of the geographical regions compared to all others, the overall per capita consumption falls outside the range of per capita consumption by age.
For example, in my data, the results are as follows:
Age group Per capita chocolate consumption
0 - 7 2.75
8-12 4.07
13 - 19 4.86
20 - 35 7.42
36 - 45 7.65
46 - 60 8.58
Above 60 10.88
whereas in overall stratified sample, the per capita consumption is only 1.7 units.
Similar discrepency is observed in distribution by income groups also.
I understand that the source of problem is non availability of age and income distribution data in the various geographical and rural/urban strata. However, I hope a statistical solution to this problem exists.
I would highly appreciate if anyone could advise on this and point me to some resources that I can refer.
Thanks,
Sumeet
The disclaimer first: I'm not primarily trained in statistics, so this problem might sound very naive to the experienced statisticians. Your expertise is what I'm looking for and any help is highly appreciated.
Now the problem:
I am supposed to calculate total chocolate consumption in a population. Further, I need to calculate per capita chocolate consumption in the overall population, per capita chocolate consumption in rural and urban areas of the population, per capita chocolate consumption by age groups and by income groups.
A primary survey was conducted in a representative random sample of the population wherein annual chocolate consumption data was collected. The sample was created using a two-tier stratification. Tier I was by geographical region and Tier II was by rural and urban within each geographical region. This is illustrated as follows:
Sample size: 1200 persons
Number of regions: 4
Sample size per region:300
Within each region the sample was further subdivided into rural and urban based on the population mix of that region.
Once the survey data was received, aggregations were done as follows:
Total consumption (T) = sum of consumption in each region (T1 + T2 + T3 +T4)
Consumption in a region = consumption in region's urban stratum + consumption in region's rural stratum [eg. T1 = Tu1 + Tr1, etc]
Consumption in the region's urban/rural stratum = (Consumption in the specific stratum / sample size in the specific stratum) * Population of the specific stratum
[eg. Tu1 = Cu1/Su1*Pu1 ; Tr1 = Cr1/Sr1*Pr1]
Now that we have the total consumption, calculating overall per capita consumption is fairly simple... Y = T/P
Per capita consumption by rural/urban stratum is calculated as:
Yu = (Tu1 + Tu2 + Tu3 +Tu4) / (Pu1 + Pu2 + Pu3 + Pu4)
Yr = (Tr1 + Tr2 + Tr3 + Tr4) / (Pr1 + Pr2 + Pr3 + Pr4)
The problem is, when I try to calculate per capita consumption by income group and by age group, I don't have data on distribution of population by income and age. Hence I resort to the crude method of using unweighted sum of consumption in the sample for these calculations. This is leading to problems, eg. if consumption of chocolates is far higher in one of the geographical regions compared to all others, the overall per capita consumption falls outside the range of per capita consumption by age.
For example, in my data, the results are as follows:
Age group Per capita chocolate consumption
0 - 7 2.75
8-12 4.07
13 - 19 4.86
20 - 35 7.42
36 - 45 7.65
46 - 60 8.58
Above 60 10.88
whereas in overall stratified sample, the per capita consumption is only 1.7 units.
Similar discrepency is observed in distribution by income groups also.
I understand that the source of problem is non availability of age and income distribution data in the various geographical and rural/urban strata. However, I hope a statistical solution to this problem exists.
I would highly appreciate if anyone could advise on this and point me to some resources that I can refer.
Thanks,
Sumeet