Parallel computing and pseudo random number generating (PRNG)

TheEcologist

Global Moderator
#1
Hi Guys,

I'm working on parallel MCMC algorithms both in R and C, but I'm kindoff worried about the independence between the sequences in my p parallel streams/threads.

I know setting a new random seed for each stream would be required but does that really insure independence? I don't think it provides any reinsurance that my sequences are truly independent.

PRNGs are typically also periodic, which means that the sequence will eventually repeat itself every length T (see e.g. --> http://www.random.org/randomness/)

Another way this could be done is by making sure I choose starting positions in the PRNG sequence that don't overlap for each p streams (basically dividing the sequence T in p parts) but this doesn't really sound feasible.

Anybody worked with this problem before and got tips? I'm sure people with backgrounds in computing, math and stats have run into this problem before (other than in my field, Ecology, that is).

Any suggestions will be greatly appreciated!

TE
 

TheEcologist

Global Moderator
#2
Oke, I have been doing some research.

Basically what you need to do is use strictly non-overlapping streams, so that boils down to the following -->

1) each parallel stream will need a separate seed.
2) not all seeds are good, if seeds are not chosen well, streams will overlap and not be independent

Oke so we need to select seeds (S) so streams don’t overlap at all, which should look like this (for X steams of length 20 000)-->

• Stream 1 pick S0 as seed which with gives 1:20 000 in the RNG cycle
• For stream 2 pick S1 as seed which with gives 20 0000:40 000 in the RNG cycle
• And so forth until you end at seed SX or the cycle ends.

So crucially it depends on the cycle length.

R uses different kinds of RNG, depending on which one you use this may not be such an issue;

The "Wichmann-Hill" has cycle length of approximately 7e12 (which seems long, but is actually quite short if a typical simulation is in the range > 1e9!). You should really watch out with this one.

The "Super-Duper" has a 4.6*10^18 for most initial seeds, which is already much better but the seed input is not straightforward. It requires two integers, one is free - yet the other must be odd.

The ""Marsaglia-Multicarry" (current default) seems promising with a cycle of 2^60. It also requires two seed inputs, but they are both free integers.

Problem is I can't figure out how to select the seed so I'm sure my sequences would cycle!

Then there is the "L'Ecuyer-CMRG" which seem particulary built for multiple streams.. I'm looking into this now.

Will report back,

TE
 

TheEcologist

Global Moderator
#3
Update;

Oke, I've found a way that works.. although it's requires user resetting of .Random.seed which is not recommended.

As par usual, the good folks developing R already have some built in base solutions for this. Key is to use the RNG "L'Ecuyer-CMRG" or ‘combined multiple-recursive generator’ from L'Ecuyer (1999), which is has a cycle of around \(2^{191}\)
I don't think it is likely to cycle will be reached for all practical purposes for the ammount of streams I will have (max 32 - 64 on a GPU) :)

Next steam is to divide the cycle into bits, though this in complicated to do yourself (I haven't figured it out yet), the function RNGstreams will do it for you.
If I understand it correctly it cuts each serial ‘stream’ is a subsequence of the period of length 2^127 which is in turn divided into ‘substreams’ of length \(2^{76}\).
(that should provide you will a hell of a lot of possible substreams! In fact about \((2^{191}/2^{127}) (2^{127}/2^{76})\) if I calculated it correctly :) ).

Code:
# set RNG to "L'Ecuyer-CMRG"
RNGkind("L'Ecuyer-CMRG")

# save original seed
s0 <- .Random.seed 
s <- s0

# number of streams
ncores=4
    
    for(i in 1:ncores){
    #Reset random seed, to a subsequence of the period of length [I]2^127[/I]
    # you could also set the subsequence into a period of [I]2^76 
    # by using   parallel:::[/I]nextRNGSubSteam(s)

.Random.seed<-s<-parallel:::nextRNGStream(s)

# normally you put your parallel code below
print(rnorm(4))
  }    


#reset seed
.Random.seed<-s0