Search code examples
pythonmcmcpymc

Getting the statistics of deterministic variables in PyMC


Say I have a random collection of (X,Y) points:

import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import scipy

x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=200)
y = map((lambda a: a[0] + a[1]), zip(x,y))

plt.scatter(x,y)

                    enter image description here

and that I fit simple linear regression:

std = 20.
tau=1/(std**2)
alpha = pm.Normal('alpha', mu=0, tau=tau)
beta  = pm.Normal('beta', mu=0, tau=tau)
sigma = pm.Uniform('sigma', lower=0, upper=20)

y_est = alpha + beta * x

likelihood = pm.Normal('y', mu=y_est, tau=1/(sigma**2), observed=True, value=y)

model = pm.Model([likelihood, alpha, beta, sigma, y_est])
mcmc  = pm.MCMC(model)
mcmc.sample(40000, 15000)

How can I get the distribution or the statistics of y_est[0], y_est[1], y_est[2].. (note that these variables correspond to the y values estimated for each input x value.


Solution

  • Following @Chris' advice, the following works:

    x     = pm.Uniform('x', lower=xmin, upper=xmax)
    alpha = pm.Normal('alpha', mu=0, tau=tau)
    beta  = pm.Normal('beta', mu=0, tau=tau)
    sigma = pm.Uniform('sigma', lower=0, upper=20)
    
    # The deterministic:
    y_gen = pm.Lambda('y_gen', lambda a=alpha, x=x, b=beta: a + b * x)
    

    And then draw samples from it as follows:

    mcmc = pm.MCMC([x, y_gen])
    mcmc.sample(n_total_samples, n_burn_in)
    
    x_trace = mcmc.trace('x')[:]
    y_trace = mcmc.trace('y_gen')[:]