Hi,

just to give a quick summary: the problem was that we had a small sample taken from a skewed distribution and we were asked the question whether the sample size is large enough to estimate the median and if not what would be the right sample size?

My idea was to use bootstrap -

1. generate samples of a given size n

2. estimate the confidence interval of the median estimates (P95-P5)

3. repeat with a slightly greater n

4. build a regression with the confidence interval width as the DV and sample size as the IV

5. use the regression to estimate the necessary sample size.

The method seems to work in that I can see a nice shrinking of the CI width with n. The regression die not work with n as the IV because the connection is non-linear and going into saturation quite quickly (no big surprise when I think about it) R-squared of 5% . using 1/n gave much better results (R-squared almost 70%).

It tuened out that the decrease in CI width for samples about the double of wjat I actually had was not practically significant, the data being very skewed. So I simulated a data set that was veary similar to the one I had (the suggestion of Greta) , and tried the method on that set. The effect was a lot more visible, but the decrease was again quite slow (as a dependence on 1/n would suggest) . Probably a better model would give more insight in the number of samples needed, but as measurements are quite costly in our case it did not make much sense for me to pursue this further.

In summary, the idea seems to work but care must be taken with building of the model CI width vs. sample size. It might also end up with the result, that the needed sample size is completely unrealistic.

regards

rogojel