Bayes factors in pymc3

I'm interested in computing Bayes factors to compare two models in PyMC 3. According to this website, in PyMC 2, the procedure seems relatively straightforward: include a Bernoulli random variable and a custom likelihood function, which returns the likelihood of the first model when the value of the Bernoulli variable is 0, and the likelihood of the second model when the value is 1. In PyMC 3, however, things get more complicated, because the stochastic nodes need to be Theano variables.

My two likelihood functions are Binomials, so I guess I need to re-write this class:

class Binomial(Discrete):
    R"""
    Binomial log-likelihood.
    The discrete probability distribution of the number of successes
    in a sequence of n independent yes/no experiments, each of which
    yields success with probability p.
    .. math:: f(x \mid n, p) = \binom{n}{x} p^x (1-p)^{n-x}
    ========  ==========================================
    Support   :math:`x \in \{0, 1, \ldots, n\}`
    Mean      :math:`n p`
    Variance  :math:`n p (1 - p)`
    ========  ==========================================
    Parameters
    ----------
    n : int
        Number of Bernoulli trials (n >= 0).
    p : float
        Probability of success in each trial (0 < p < 1).
    """
    def __init__(self, n, p, *args, **kwargs):
        super(Binomial, self).__init__(*args, **kwargs)
        self.n = n
        self.p = p
        self.mode = T.cast(T.round(n * p), self.dtype)

    def random(self, point=None, size=None, repeat=None):
        n, p = draw_values([self.n, self.p], point=point)
        return generate_samples(stats.binom.rvs, n=n, p=p,
                                dist_shape=self.shape,
                                size=size)

    def logp(self, value):
        n = self.n
        p = self.p

        return bound(
            binomln(n, value) + logpow(p, value) + logpow(1 - p, n - value),
            0 <= value, value <= n,
            0 <= p, p <= 1)

Any suggestions on where to start?

Solution

You could try something like this:

with pm.Model() as model:
    p = np.array([0.5, 0.5])
    model_index = pm.Categorical('model_index', p=p)
    model0 # define one model here
    model1 # define the other model here
    m = pm.switch(pm.math.eq(model_index, 0), model0, model1)
    # pass M to a prior or likelihood, for example
    y = pm.SomeDistribution('y', m, observed=data)

    step0 = pm.ElemwiseCategorical(vars=[model_index], values=[0,1])
    step1 = pm.NUTS()
    trace = pm.sample(5000, step=[step0, step1], start=start)

Then you compute the Bayes factors as: (add burnin if necessary)

pm.traceplot(trace);
pM1 = trace['model_index'].mean()
pM0 = 1 - pM1
pM0, pM1, (pM0/pM1)*(p[1]/p[0])

You may also want to check how yo use Information Criteria to compare models, see an example here