PyMC code gives unusual results

I tried to solve a logistic regression model using PyMC. However, the diagnostic plots show very high autocorrelations and after repeated sampling from the posterior distribution, I sometimes obtain very different results, so maybe I'm not using PyMC properly.

The model is the following:

Y_i = Bernoulli(p_i)
logit(p_i) = b0 + b1*x1 + b2*x2

The implementation is this:

import pymc as pm
import numpy as np

b0 =  pm.Normal('b0',  0., 1e-6, value=0.)
b1 =  pm.Normal('b1',  0., 1e-6, value=0.)
b2 =  pm.Normal('b2',  0., 1e-6, value=0.)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

@pm.deterministic
def p(b0=b0, b1=b1, b2=b2, x1=x1, x2=x2):
    return np.exp(b0 + b1*x1 + b2*x2)/(1.0 + np.exp(b0 + b1*x1 + b2*x2))

obs = pm.Bernoulli('obs', p=p, value=value, observed=True)

m = pm.MCMC([obs, b0, b1, b2])

When I sample with m.sample(500000, 200000, 50) and plot the resulting posterior distribution, I get this:

enter image description here

In a second attempt to get better results, I used @pm.observed:

import pymc as pm
import numpy as np

b0 =  pm.Normal('b0',  0., 1e-6, value=0.)
b1 =  pm.Normal('b1',  0., 1e-6, value=0.)
b2 =  pm.Normal('b2',  0., 1e-6, value=0.)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

p = b0 + b1*x1 + b2*x2

@pm.observed
def obs(p=p, value=value):
    return pm.bernoulli_like(value, pm.invlogit(p))

m = pm.MCMC([obs, p, b0, b1, b2])

But it also produces high autocorrelations.

I increased the sample size without much success. What am I missing?

Solution

You definitely have some convergence issues. Part of the problem is that you have a very small sample size (n=9), but are trying to fit 3 parameters. I get a little better mileage by block-updating the parameters, but the intercept still does poorly:

beta =  pm.Normal('beta',  0., 0.001, value=[0.]*3)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

p = pm.Lambda('p', lambda b=beta: pm.invlogit(b[0] + b[1]*x1 + b[2]*x2))

obs = pm.Bernoulli('obs', p=p, value=value, observed=True)

(BTW, when you are logit-transforming your parameters, you don't need super-diffuse priors on your betas -- smaller than 0.001 is pretty fruitless). Things are improved further by using an adaptive Metropolis sampler on the betas:

M.use_step_method(AdaptiveMetropolis, M.beta)

This results in convergence, but as you can see, there is no information to inform the prior:

summary