Search code examples
pythonparallel-processingpymcpymc3

Parallelization of PyMC


Could someone give some general instructions on how one can parallelize the PyMC MCMC code. I am trying to run LASSO regression following the example given here. I read somewhere that parallel sampling is done by default, but do I still need to use something like Parallel Python to get it to work?

Here is some reference code that I would like to be able to parallelize on my machine.

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3

beta1_lasso = pymc.Laplace('beta1', mu=0, tau=1.0 / b)
beta2_lasso = pymc.Laplace('beta2', mu=0, tau=1.0 / b)
beta3_lasso = pymc.Laplace('beta3', mu=0, tau=1.0 / b)

@pymc.deterministic
def y_hat_lasso(beta1=beta1_lasso, beta2=beta2_lasso, beta3=beta3_lasso, x1=x1, x2=x2, x3=x3):
    return beta1 * x1 + beta2 * x2 + beta3 * x3

Y_lasso = pymc.Normal('Y', mu=y_hat_lasso, tau=1.0, value=y, observed=True)

lasso_model = pymc.Model([Y_lasso, beta1_lasso, beta2_lasso, beta3_lasso])
lasso_MCMC = pymc.MCMC(lasso_model)
lasso_MCMC.sample(20000,5000,2)

Solution

  • It looks like you are using PyMC2, and as far as I know, you must use some Python approach to parallel computation, like IPython.parallel. There are many ways to do this, but all the ones I know are a little bit complicated. Here is an example of one, which uses PyMC2, IPCluster, and Wakari.

    In PyMC3, parallel sampling is implemented in the psample method, but your reference code will need to be updated to the PyMC3 format:

    with pm.Model() as model:
        beta1 = pm.Laplace('beta1', mu=0, b=b)
        beta2 = pm.Laplace('beta2', mu=0, b=b)
        beta3 = pm.Laplace('beta3', mu=0, b=b)
    
        y_hat = beta1 * x1 + beta2 * x2 + beta3 * x3
        y_obs = pm.Normal('y_obs', mu=y_hat, tau=1.0, observed=y)
    
        trace = pm.psample(draws=20000, step=pm.Slice(), threads=3)