I am trying to run a simple linear regression using PyMC3. The below code is a snippet:
import numpy as np
from pymc3 import Model, sample, Normal, HalfCauchy
import pymc3 as pm
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
with Model() as multiple_regression_model:
beta = Normal('beta', mu=0, sd=1000, shape=2)
sigma = HalfCauchy('sigma', 1000)
y_hat = beta[0] + X * beta[1]
exp = Normal('y', y_hat, sigma=sigma, observed=y)
with multiple_regression_model:
trace = sample(1000, tune=1000)
trace['beta'].mean(axis=0)
The above code runs in about 6 seconds and gives reasonable estimates for the betas ([-0.19646408, 1.00053091])
But when I try to use the dot product, things get really bad:
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
X_aug_np = np.squeeze(np.dstack((np.ones((500, 1)), X)))
with Model() as multiple_regression_model:
beta = Normal('beta', mu=0, sd=1000, shape=2)
sigma = HalfCauchy('sigma', 1000)
y_hat = pm.math.dot(X_aug_np, beta)
exp = Normal('y', y_hat, sigma=sigma, observed=y)
with multiple_regression_model:
trace = sample(1000, tune=1000)
trace['beta'].mean(axis=0)
Now the code finished in 56 seconds and the estimates are totally off ([249.52363555, -0.0000481 ]).
I thought using dot product will make things faster. Why is it behaving this way? Am I doing something wrong here?
This is a subtle shape and broadcasting bug: if you change the shape
of beta
to (2, 1)
, then it works.
To see why, I renamed the two models and tidied the code a bit:
import numpy as np
import pymc3 as pm
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
X_aug_np = np.squeeze(np.dstack((np.ones((500, 1)), X)))
with pm.Model() as basic_model:
beta = pm.Normal('beta', mu=0, sd=1000, shape=2)
sigma = pm.HalfCauchy('sigma', 1000)
y_hat = beta[0] + X * beta[1]
exp = pm.Normal('y', y_hat, sigma=sigma, observed=y)
with pm.Model() as matmul_model:
beta = pm.Normal('beta', mu=0, sd=1000, shape=(2, 1))
sigma = pm.HalfCauchy('sigma', 1000)
y_hat = pm.math.dot(X_aug_np, beta)
exp = pm.Normal('y', y_hat, sigma=sigma, observed=y)
How would you have found that out? Since it looked like the models were the same, but they were not sampling similarly, I ran
print(matmul_model.check_test_point())
print(basic_model.check_test_point())
which computes the log probability of the variables at a sensible default. This did not match up, so I checked exp.tag.test_value.shape
, and found out it was (500, 500)
, when I expected it to be (500, 1)
. Shape handling is super hard in probabilistic programming, and this happened because exp
broadcasts y_hat
, sigma
, and y
together.
As an added problem, I could not get matmul_model
to sample on my machine, without setting cores=1, chains=4
.