Monte Carlo simulation: how many runs?

#1
Hi,

Most papers I read related to Monte Carlo (MC) simulation method say that there is no rule on the minimum number of runs required. However, in book Risk Analysis by D. Vose there is the following formula for the estimation of the true mean:

\(n>\left ( \frac{\Phi ^{-1}\left ( \frac{1+\alpha }{2} \right )\sigma }{\delta } \right )^{2}\)

n - number of runs
Ф^(-1) - the inverse of the normal cumulative distribution function
sigma - standard deviation
alpha and delta - desired error and confidence

(the formula is derived based on the distribution of the estimate of the true mean (asymptotically) from the Central limit theorem and the assumption that If Monte Carlo sampling is used, each generated value Xi is an iid random variable).

The formula is derived for making random draws from a univariate distribution. My question is, would it be still valid if I have to make random draws using MC from a multivariate distribution? If not, is there any way to derive a similar formula for a multivariate case based on the same assumptions as for the univariate case?

Thank you.
 

Miner

TS Contributor
#2
It would be interesting to know when that book was written. The only reason to limit the number of MC simulations is if those simulations take an unacceptable amount of time to run. In the early days, it might take a day to run 10k simulations. Today, I can run a million simulations in 15 seconds. Why limit the number of simulations?
 

Dragan

Super Moderator
#3
It really depends on what it is you're doing. Of course, Miner is correct if an individual is conducting a MC simulation that involves, for example, a t-test or One-Way ANOVA. However, if you're doing something that involves, say, empirical Bayesian estimation using a technique such Markov Chain Monte Carlo methods (e.g. Gibbs Sampling) in the context of Item Response Theory (IRT), then things can get really computationally expensive. In this case, this really would limit the number of replications in the MC study - it doesn't matter how advanced your desktop computer/software is.
 

Dason

Ambassador to the humans
#4
No matter how advanced our computing power gets we will always be asking questions that require computational power outside the reach of what we can easily do in a "short" amount of time.
 

noetsi

Fortran must die
#5
I ran into this issue recently in multiple imputations. The early recommendations were say to run 5 imputations. Now they recomends hundreds or thousands. The reason is because the results get better the more your run. Old recommendations were based on less powerful computers that took a long time to run each imputation. The trade off is, once you reach a minimum size, between increasing accuracy and time to increase the number run.

Which is a reason to be cautious about software comments from a decade or more ago. My last computer bought say 3 years ago had 8 gigs of memory (still a lot from the past). The one I got this year has 388 gigs. That makes a lot of difference (has does having 4 processors rather than two I would guess):p
 

noetsi

Fortran must die
#7
No a developers computer designed to handle really large queries with massive amounts of data from relational tables very fast. Very expensive, very high end computer (they cost thousands of dollars). Because I am the only one who runs stats in a large state agency they bought it for me.

The servers I use are entirely different....
 

noetsi

Fortran must die
#8
I checked with IT and I lied. I have 388 gigs of volitair memory on my hard drive. I only have 10 gigs of RAM....
 
#9
one run for the model I'm working with takes several hours on one of the most powerful computers in our university. So if it's 100 runs, Monte Carlo is feasible, if it's 1000 Runs, then it is not. That's why it is important for me to understand how to determine the minimum number of runs required.
 
#11
yes, it is multivariate (around 8-10 components)

So I wonder if the formula I presented can be used to justify the number of runs for the multivariate case.
Or if there is any rule of thumb to determine the minimum number of draws required depending on the dimension of the multivariate distribution.

There are papers published with 100 runs for similar problems but this number of runs is never justified by anything but the computational time.
 

maartenbuis

TS Contributor
#12
What is the purpose of the Monte Carlo simulation? For example, are you trying to check or estimate a p-value or a confidence interval or are you trying to get an idea about the consistency of the point-estimate? What is your model? What estimates from that model are you interested in?
 

BGM

TS Contributor
#13
So if you look back the derivation,

\( \Pr\{|\bar{X}_n - \mu| < \delta\} > \alpha \) where \( \mu \) is the true mean

\( \Rightarrow \Pr\left\{\sqrt{n}\frac {|\bar{X}_n - \mu|} {\sigma} <
\frac {\delta\sqrt{n}} {\sigma} \right\} > \alpha \)

\( \Rightarrow \Phi\left(\frac {\delta\sqrt{n}} {\sigma}\right) -
\Phi\left(-\frac {\delta\sqrt{n}} {\sigma}\right) > \alpha \) (approxmiated by CLT)

\( \Rightarrow 2\Phi\left(\frac {\delta\sqrt{n}} {\sigma}\right) - 1
> \alpha \)

\( \Rightarrow \Phi\left(\frac {\delta\sqrt{n}} {\sigma}\right) > \frac {1 + \alpha} {2} \)

\( \Rightarrow \frac {\delta \sqrt{n}} {\sigma} > \Phi ^{-1}\left(\frac{1 + \alpha} {2}\right) \)

\( \Rightarrow n > \left ( \frac{\Phi ^{-1}\left ( \frac{1+\alpha }{2} \right )\sigma }{\delta } \right )^{2}\)


So if you want to extend it to a multivariate case where \( \bar{X}_n \) will be a \( p \)-dimensional random vector instead, then you will need to seek a vector norm to measure the distance between the sample mean and the population mean. One possible distance will be Mahalanobis distance, and we can make use of the result from the multivariate CLT:

\( n(\bar{X}_n - \mu)\Sigma^{-1}(\bar{X}_n - \mu) \stackrel {d} {\to} \chi^2(p) \)

provided that the random sample itself satisfy the multivariate CLT assumption.