I am trying to parallelise a single MCMC chain which is sequential in nature and hence, I need to preserve the order of iterations being executed. For this purpose, I was thinking of using an 'ordered for' loop via OpenMP. I wanted to know how does the execution of an ordered for loop in OpenMP really work, does it really provide any speed-up in terms of parallelisation of the code?
Thanks!
As long as you're having just a single Markov chain, the easiest way to parallelize it is to use the 'embarassing' parallelism: run a bunch of independent chains and collect the results when they all are done [or gather the results once in a while.]
This way you do not incur any communication overhead whatsoever.
The main caveat here is that you need to make sure different chains get different random number generator seeds.
UPD: practicalities of collecting the results.
In a nutshell, you just mix together the results generated by all the chains. For the sake of simplicity, suppose you have three independent chains:
x1, x2, x3,...
y1, y2, y3,...
z1, z2, z3,...
From these, you make a chain x1,y1,z1,x2,y2,z2,x3,y3,z3,...
This is a perfectly valid MC chain and it samples the correct distribution.
Writing out all the chain history is almost always impractical. Typically, each chain saves the binning statistics, which you then mix together and analysize by a separate program. For binning analysis see, e.g. [boulder.research.yale.edu/Boulder-2010/ReadingMaterial-2010/Troyer/Article.pdf][1]