I've been reading threads off and on for a couple days, and I can't get a good feel for how to answer this question I've got stuck in my head. I'm sure the answer is in here, but I'm an engineer, not a statistician, so I can't make the leap to apply the posts I've read to the question I have.

My question relates to the Central Limit Theorem and how it can be applied to glean information about the underlying distribution. I understand that if you take the mean of a large enough sample of data, it will be correlated to the underlying population mean and the standard deviation. However, what does this really tell me? If the underlying distribution is normal, then I can understand that the results could be directly applicable and could help me predict the underlying distribution, but what if the underlying distribution is not normal? What value does being able to get the mean really have? Consider the below:

You have a bag full of identically shaped/sized stones. The stones each have a number printed on them from 1-10. I want to be able to determine the likelihood of pulling a stone with a '10' on it, and I'd also like to know what the underlying distribution is. Are the stones numbered following a uniform, normal, bimodal, or other distribution? Can I even use the central limit theorem to answer this question, or do I simply need to take a certain # of samples to get a good estimate here? If I simply have to take samples, how many do I have to take?

I've coded up some perl scripts that generate large volumes of "population data" based on distributions that I define, which I then 'sample' and do various tests, but I was hoping that a real stats person could help shed some light on the actual theory and math behind this since my empirical tinkering really isn't giving me the direction on this I'd like.

Thanks for any help!