# Extrapolating from a small size

#### Luch

##### New Member
I gathered information about biological behaviours, to be precise Malnutrition eg Weight-for-Height, Height-for-Age etc. Instead of 500, I collected 100. Is there any hint that i use to make the results i got from the 100 valid for 500. The package used is SPSS, if it matters.

Currently, I have plotted a graph for the 100, which followed the normal curve, but when i duplicated the results to get 500, the curve was not normal again.

#### Dason

What? I don't really understand what you're saying at all. What is it that you're plotting (the density? Some sort of scatterplot?)? If you only collected 100 what do you mean when you say "when I duplicated the results to get 500"...?

#### Luch

##### New Member
What? I don't really understand what you're saying at all. What is it that you're plotting (the density? Some sort of scatterplot?)? If you only collected 100 what do you mean when you say "when I duplicated the results to get 500"...?
By DUPLICATION, I reproduced each record into 5 places so that i now had 500 instead of the 100 that i got from the field.

What I plotted was Height vs Age? In biological system, most or all parameters follow a normal curve. View attachment 1321 View attachment 1322

#### Dason

What's the point of replicating? All you're doing is pretending that you're a lot more sure in those values than you actually are.

#### Berley

##### Member
What do you mean you "only" collected 100? If you mean that you sampled 100 individuals, why is that bad? Why do you think you need 500 cases?

Whether 100 cases is enough is totally dependent on the total number of children in the population you're studying. What is that population? If you sampled 100 children from a particular school with a total enrollment of 500, then your results are about +/- 8.6% at 95%. If the total enrollment (population) is 5000, then your margin of error goes up to 9.5%.

If you want to reduce that margin of error, you'll need to sample more children. You can't assume those missing cases will be the same as the cases you've already counted.

If you measured every child in the group (100 children out of 100 enrolled), then you're done. Your results are the results for that group.

By the way, the WHO standard that you are comparing your results to is NOT a normal distribution. It looks like it's leptokurtic -- taller and skinnier than a normal distribution.

#### Luch

##### New Member
I just thought that somehow it should be possible.

The population should be like 5000, and 100 was collected.

More will still be collected, but i have been trying to get results from the 100. Morelike, what will I do the curve of the 'duplicated' 500 that will make it look exactly like that the 100.

#### Berley

##### Member
I'm still trying to figure out what you mean when you say "valid." Do you maybe mean that you are looking for results that are "statiscially significant"?

Or are you trying to make your graph look like a normal distribution? You can't force your data to be normal. You can sometimes manipulate it a bit to make it look more normal, but you don't change the data to do that. Yeah, worldwide the relationship between a child's height and weight is probably normally distributed. But that doesn't mean your population of 5,000 is normally distributed or that your sample of that 5,000 is normally distributed. Your results aren't wrong just because they aren't normally distributed.

The results you have are the results you have. The shape of your histogram or scatterplot is not going to change if you copy your existing results five times over. When you say you "duplicated" the data, what you really did was (essentially) falsify your data. You didn't really sample 500 children, so 4/5 of your data is made up. You don't want to report that.

There is no magic number of how many cases you are supposed to have. The sample size you need to collect is determined by the size of the entire population and how accurate you want to be. The less accurate you need to be, the fewer samples you need to take. But those samples need to be properly selected following the correct sampling procedure for your experiment/investigation. Otherwise, you can't say the sample represents the bigger group.

You have a set of results. Those are your results. More data may change those results, but those data don't exist yet. You report on what you have, not what you think you're going to get in the future.