Search code examples
pythonprobabilitybayesianpymc3stan

direct log probability increment (target += ...) in pymc3 as in stan


In the probabilistic programming language stan, given the data/params blocks:

data {
    int N;
    real[N] data;
}

params {
    real mu;
}

The following model blocks are equivalent:

"Sample notation":

model { 
    data ~ normal(mu, 1);
}

"Direct log probability increment notation":

model {
    for (n in 1:N)
        target += normal_lpdf(data[n] | mu, 1);
}

where target represents the total log density that is stochastically updated by an MCMC (NUTS) sampler. The benefit of using the latter notation is an increase in flexibility for how to define the log probability model, specifically one can provide samples the model generates through computation (in the example above data[n] but this can also be used in more contexts).

The sample notation can be applied in pymc3 (python) as:

with pm.Model() as model:
    mu = pm.Flat('mu')
    x = pm.Normal('x', mu=mu, sd=1.0, observed=data)

Question: How can I apply the the same direct log probability increment notation, where I specify the samples, in pymc3?


Solution

  • You can run this using a pm.Potential, as in:

    import numpy as np
    import pymc3 as pm
    
    data = 5 + 3 * np.random.randn(20)
    
    
    with pm.Model() as model:
        x = pm.Flat('x')
        pm.Potential('y', pm.Normal.dist(x, 1).logp(data))
    

    You can verify that this is doing the right thing via:

    import scipy.stats as st
    
    print(st.norm(0, 1).logpdf(data).sum(), model.logp({'x': 0}))
    

    Note that Potential has to contain a theano expression -- pm.Normal.dist(x, 1) happens to be implemented in theano. You could also write the log pdf explicitly:

    import theano.tensor as tt
    
    with pm.Model() as model:
        x = pm.Flat('x')
        pm.Potential('y', -0.5 * tt.sum((x - data) ** 2) 
                         - 0.5 * len(data) * np.log(2 * np.pi))