Could someone give some general instructions on how one can parallelize the PyMC MCMC
code. I am trying to run LASSO
regression following the example given here. I read somewhere that parallel sampling is done by default, but do I still need to use something like Parallel Python
to get it to work?
Here is some reference code that I would like to be able to parallelize on my machine.
x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)
X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3
beta1_lasso = pymc.Laplace('beta1', mu=0, tau=1.0 / b)
beta2_lasso = pymc.Laplace('beta2', mu=0, tau=1.0 / b)
beta3_lasso = pymc.Laplace('beta3', mu=0, tau=1.0 / b)
@pymc.deterministic
def y_hat_lasso(beta1=beta1_lasso, beta2=beta2_lasso, beta3=beta3_lasso, x1=x1, x2=x2, x3=x3):
return beta1 * x1 + beta2 * x2 + beta3 * x3
Y_lasso = pymc.Normal('Y', mu=y_hat_lasso, tau=1.0, value=y, observed=True)
lasso_model = pymc.Model([Y_lasso, beta1_lasso, beta2_lasso, beta3_lasso])
lasso_MCMC = pymc.MCMC(lasso_model)
lasso_MCMC.sample(20000,5000,2)
It looks like you are using PyMC2, and as far as I know, you must use some Python approach to parallel computation, like IPython.parallel. There are many ways to do this, but all the ones I know are a little bit complicated. Here is an example of one, which uses PyMC2, IPCluster, and Wakari.
In PyMC3, parallel sampling is implemented in the psample
method, but your reference code will need to be updated to the PyMC3 format:
with pm.Model() as model:
beta1 = pm.Laplace('beta1', mu=0, b=b)
beta2 = pm.Laplace('beta2', mu=0, b=b)
beta3 = pm.Laplace('beta3', mu=0, b=b)
y_hat = beta1 * x1 + beta2 * x2 + beta3 * x3
y_obs = pm.Normal('y_obs', mu=y_hat, tau=1.0, observed=y)
trace = pm.psample(draws=20000, step=pm.Slice(), threads=3)