Search code examples
regressionpythonpymc

Fitting logistic regression with PyMC: ZeroProbability error


To teach myself PyMC I am trying to define a simple logistic regression. But I get a ZeroProbability error, and does not understand exactly why this happens or how to avoid it.

Here is my code:

import pymc
import numpy as np

x = np.array([85, 95, 70, 65, 70, 90, 75, 85, 80, 85])
y = np.array([1., 1., 0., 0., 0., 1., 1., 0., 0., 1.])

w0 = pymc.Normal('w0', 0, 0.000001)    # uninformative prior (any real number)
w1 = pymc.Normal('w1', 0, 0.000001)    # uninformative prior (any real number)

@pymc.deterministic
def logistic(w0=w0, w1=w1, x=x):
    return 1.0 / (1. + np.exp(-(w0 + w1 * x)))

observed = pymc.Bernoulli('observed', logistic, value=y, observed=True)

And here is the trace back with the error message:

Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-43ed68985dd1>", line 24, in <module>
    observed = pymc.Bernoulli('observed', logistic, value=y, observed=True)
  File "/usr/local/lib/python2.7/site-packages/pymc/distributions.py", line 318, in __init__
    **arg_dict_out)
  File "/usr/local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in __init__
    if not isinstance(self.logp, float):
  File "/usr/local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
    raise ZeroProbability(self.errmsg)
ZeroProbability: Stochastic observed's value is outside its support,
 or it forbids its parents' current values.

I suspect np.exp to be causing the trouble, since it returns inf when the linear equation becomes too high. I know there are other ways to define a logistic regression using PyMC (her is one), but I am interested in knowing why this approach does not work, and how I can define the regression using the Bernoulli object instead of using bernoulli_like


Solution

  • When you create a your normal stochastastic with pymc.Normal('w0', 0, 0.000001), PyMC2 initializes the value with a random draw from the prior distribution. Since your prior is so diffuse, this can be a value which is so unlikely that the posterior is effectively zero. To fix, just request a reasonable initial value for your Normal:

    w0 = pymc.Normal('w0', 0, 0.000001, value=0)
    w1 = pymc.Normal('w1', 0, 0.000001, value=0)
    

    Here is a notebook with a few more details.