# Fitting a distribution to summary data?

#### SlideRule

##### New Member
I've been presented with tables of data giving the lengths of infants at various ages, and I'm asked to develop a website application that will estimate the percentages at intermediate values.

For example, the tables may say that 10% of male infants at 1.5 months of age are 55.12345 cm (not an actual value; I don’t have the data before me at the moment) in length, or less. (Don't ask me how they measure to so many decimal places!) Table values are given at 3%, 5%, 10%, 25%, 50%, 75%, 90%, 95% and 97%. I have no original data to work from, only the summary table values.

The problem then is find distributions that are reasonable fits to each row of the tables. I assume that the lengths will be normally distributed. Each row of the tables contains the lengths at 50% and 95%, and I use half that difference as a crude starting standard deviation. Then I use computer iteration, and a polynomial approximation for CDF, to discover the best standard deviation that I can. (By sweeping a range of nearby standard deviation values to discover which value gives the smallest least squares difference between the polynomial approximations and the given table values.)

But the final results are very poor. Many of the (square roots of the) least squares differences are large in proportion to the standard deviations, so estimates are sometimes far from the table values. For a value that the table says should be 10% my final calculated estimate may be 17%, for example.

Either my method is flawed or the table values are not a good fit for an ideal normal distribution. Is there a better approach that anyone here can suggest? Thank you.