Search code examples
pymc3

Work with the actual hypotheses in pymc3 - Porting examples from ThinkBayes to pymc3


I am replicating some of the examples presented in "Think Bayes" by Allen Downey to pymc3.

His great book provides us some introductory examples to Bayesian Methods and is done using Allen's own library.

There is the "Train Problem", where you need to predict the number of trains a company have based on the number you see painted on each train (each train is numbered from 1 to N)

The likelihood of this problem is basically

def likelihood(self, data, hypo):
    if data > hypo:
        return 0
    return 1/hypo

for data in stream:
    for hypo in hypothesis:
        self.posterior[hypo] *= likelihood(data, hypo)

data in the number you've seen on a train.

How can I define that custom likelihood is pymc3? I'm using DensityDist to create my own likelihood function, but this one that I'm replicating is dependent on the hypothesis that ranges from 1 to N (let's say N = 100) and in pymc3 I couldn't find a way to get the X's from the tensors.


Solution

  • This problem is also know as the German Tank problem. Since during WWII the allies were trying to find the number of German tanks based on the serial number of captured tanks.

    I think the problem can be solved by the following model

    with pm.Model() as model:
        N = pm.DiscreteUniform('N', lower=y.max(), upper=y.max()*10)
        y_obs = pm.DiscreteUniform('y', lower=0, upper=N, observed=y)
    
        trace = pm.sample(10000)
    

    Depending on your actual problem you may relax the discrete assumption (that is really reasonable) and use a continuous distribution like the Uniform one.

    with pm.Model() as model:
        N = pm.Uniform('N', lower=y.max(), upper=y.max()*10)
        y_obs = pm.Uniform('y', lower=0, upper=N, observed=y)
    
        trace = pm.sample(1000)
    

    One advantage of relaxing the discrete assumption is that now you can use NUTS. Instead, in the previous model you are restricted to Metropolis since you were using discrete variables.