PyMC3 estimating size of closed population with time-varying capture probabilities

I am trying to port this example of a latent multinomial model in PyMC2 over to PyMC3.

Model description

The aim is to estimate the actual number N of individuals in a closed population, based on a time series of T observation occasions.

There are 2^T possible series for each individual - so if T=2, an individual can either be observed never (0, 0) [let's call this sequence 1], only the first time (1, 0) [sequence 2], only the second time (0, 1) [sequence 3] or both times (1, 1) [sequence 4].

We know if an individual has resulted in sequences 2 to 4, so we have those counts.

However, the model depends on us determining the number of individuals who exist, but were never observed (sequence 1), from sampled values of N.

In the original PyMC2 model, this is done by simply appending N - sum(f) to the start of f, where f is an np.array of observed counts for sequences 2 to 2^T. In PyMC3 this must be done using Theano, and some of the other variables are also defined as Theano shared variables.

Original example in PyMC2

The original PyMC2 code looks like this:

from pymc import Lambda, Beta, DiscreteUniform, stochastic, multinomial_like
from numpy import reshape, ma, prod, array, append
import pdb

# Constants
T=2
Cells=4
# Observations seen by both observers, just the first observer, or just the second
f=(22,363,174)
# Inidicators for observer sightings
omegas=reshape((0,0, 1,0, 0,1, 1,1), (Cells, T))

# Capture probabilities
p = Beta('p', alpha=1, beta=1, size=T)

# Cell probabilities
c = Lambda('c', lambda p=p: p**omegas * (1-p)**(1.-omegas))

# Joint probabilities
pi = Lambda('pi', lambda c=c: prod(c, 1))

# Bound total N by the number of observed individuals
N = DiscreteUniform('N', lower=sum(f), upper=10000)

@stochastic(observed=True, dtype=int)
def F(value=f, n=N, p=pi):
    """Complete multinomial for observed and unobserved quatities"""
    return multinomial_like(append(n-sum(value), value), n, p)

Attempted PyMC3 version

My first attempt at a PyMC3 version (in Python 2.7) looks like this:

from __future__ import division, absolute_import, print_function

import numpy as np
import pymc3 as pm
import theano.tensor as tt
from theano import shared

# Constants

T = 2
CELLS = 4

with pm.Model() as model_mt:

    f = shared(np.array((22, 363, 174)))

    omegas = shared(np.reshape((0,0, 1,0, 0,1, 1,1), (CELLS, T)))

    p = pm.Beta('p', alpha=1, beta=1, shape=T)

    # Treating c and pi as separate steps, as per his example:
    # c = p ** omegas
    # pi = tt.prod(c, axis=1)

    # Shorter line of code:
    pi = tt.prod(p ** omegas, axis=1)

    N = pm.DiscreteUniform('N', lower=f.sum(), upper=10000)

    observed_values = tt.concatenate(([N-tt.sum(f)], f))

    y_obs = pm.Multinomial('y_obs', n=N, p=pi, observed=observed_values)

with model_mt:
    trace = pm.sample(5000)

But the resultant trace looks implausible. I get mean values of 0.27 for p_0 and 0.98 for p_1, which I understand to be the values (respectively) for seeing an individual at t=1 and at t=2, while N has a mean value of 930. But shouldn't p_1 be close to (363+174)/N?

Solution

Here is how I would translate this model to PyMC3 (which I've been meaning to do, so thanks):

with pm.Model() as Mt2:

    p = pm.Uniform('p', 0, 1, shape=T)

    c = p**omega * (1 - p)**(1-omega)

    π = pm.Deterministic('π', tt.prod(c, 1))

    π_obs = π[1:] / (1 - π[0])

    f_obs = pm.Potential('f_obs', pm.Multinomial.dist(n=n, p=π_obs).logp(f[1:]))

    N = pm.Uniform('N', 0, 10000)

    n_obs = pm.Potential('n_obs', pm.Binomial.dist(n=tt.floor(N), p=1-π[0]).logp(n))

    trace = pm.sample(1000, tune=1000)

Keeping N as a continuous variable and using factor potentials for the partially-observed observables allows PyMC3 to use NUTS rather than falling back on Metropolis.

I get an estimate of just over 611 for N:

I can't recall how this compares to the PyMC2 (or Link and Barker) output, but its considerably smaller than your estimate.

Here are my estimates of the capture probabilities:

I get an estimate of 0.325 (0.290, 0.367) for p[0] and 0.894 (0.853, 0.936) for p[1].