Search code examples
pythonpymcmcmc

pymc unexpected model output


I'm trying to use PyMC to determine the distribution of ad click through rates (CTRs). Let's say we have 1000 ads and I have measurements for clicks and views for all ads. I assume that underlying distribution of the ad CTRs is a Beta distribution, and I would like to use PyMC to estimate the parameters of this distribution. I will call these parameters in the following snippets unknown_alpha and unknown_beta.

To show my example code, here is how one could generate an example test set:

from scipy.stats import beta
from scipy.stats import geom
from scipy.stats import binom

def generate_example_data(data_size=1000, unknown_alpha=30, unknown_beta=100):
    ctrs = beta.rvs(a=unknown_alpha, b=unknown_beta, size=data_size)

    data_views = geom.rvs(0.001, size=data_size)
    data_clicks = []
    for ctr, views in zip(ctrs, data_views):
        data_clicks.append(binom.rvs(p=ctr, n=views))

    return data_views, data_clicks

And here is the code, how I tried to use PyMC:

import pymc 

def model(data_views, data_clicks):
    ctr_prior = pymc.Beta('ctr_prior', alpha=1.0, beta=1.0)
    views = pymc.Geometric('views', 0.01, observed=True, value=data_views)
    clicks = pymc.Binomial('clicks', n=views, p=ctr_prior, observed=True, value=data_clicks)

    model = pymc.Model([ctr_prior, views, clicks]) 

    mc = pymc.MCMC(model)  
    mc.sample(iter=5000, burn=5000) 

    return mc.trace('ctr_prior')[:]

views, clicks = generate_example_data()
model(views, clicks)

Output: array([ 0.])

I know that the model is not finished, yet, to infer about unknown_alpha and unknown_beta, but I don't know why I just get array([ 0.]). I expected to get a trace with 5k elements.

Can anybody explain me where I went wrong?

Cheers!


Solution

  • My guess would be the mc.sample(iter=5000, burn=5000) line. You sample for 5000, and throw away the first 5000. To keep 5000, you want mc.sample(iter=10000, burn=5000)