Search code examples
pythonbayesianpymc3mcmc

How to sample from a custom distribution when parameters are known?


The target is to get samples from a distribution whose parameters is known.

For example, the self-defined distribution is p(X|theta), where theta the parameter vector of K dimensions and X is the random vector of N dimensions.

Now we know (1) the theta is known; (2) p(X|theta) is NOT known, but I know p(X|theta) ∝ f(X,theta), and f is a known function.

Can pymc3 do such sampling from p(X|theta), and how?

The purpose is not sampling from posterior distribution of parameters, but want to samples from a self-defined distribution.

Starting from a simple example of sampling from a Bernoulli distribution. I did the following:

import pymc3 as pm
import numpy as np
import scipy.stats as stats
import pandas as pd
import theano.tensor as tt

with pm.Model() as model1:
    p=0.3
    density = pm.DensityDist('density',
                             lambda x1: tt.switch( x1, tt.log(p), tt.log(1 - p) ),
                             ) #tt.switch( x1, tt.log(p), tt.log(1 - p) ) is the log likelihood from pymc3 source code

with model1:
    step = pm.Metropolis()
    samples = pm.sample(1000, step=step)

I expect the result is 1000 binary digits, with the proportion of 1 is about 0.3. However, I got strange results where very large numbers occur in the output.

I know something is wrong. Please help on how to correctly write pymc3 codes for such non-posterior MCMC sampling questions.


Solution

  • Prior predictive sampling (for which you should be using pm.sample_prior_predictive()) involves only using the RNGs provided by the RandomVariable objects in your compute graph. By default, DensityDist does not implement a RNG, but does provide the random parameter for this purpose, so you'll need to use that. The log-likelihood is only evaluated with respect to observables, so it plays no role here.

    A simple way to generate a valid RNG for an arbitrary distribution is to use inverse transform sampling. In this case, one samples a uniform distribution on the unit interval and then transforms it through the inverse CDF of the desired function. For the Bernoulli case, the inverse CDF partitions the unit line based on the probability of success, assigning 0 to one part and 1 to the other.

    Here is a factory-like implementation that creates a Bernoulli RNG compatible with pm.DensityDist's random parameter (i.e., accepts point and size kwargs).

    def get_bernoulli_rng(p=0.5):
    
        def _rng(point=None, size=1):
            # Bernoulli inverse CDF, given p (prob of success)
            _icdf = lambda q: np.uint8(q < p)
    
            return _icdf(pm.Uniform.dist().random(point=point, size=size))
    
        return _rng
    

    So, to fill out the example, it would go something like

    with pm.Model() as m:
        p = 0.3
        y = pm.DensityDist('y', lambda x: tt.switch(x, tt.log(p), tt.log(1-p)),
                           random=get_bernoulli_rng(p))
        prior = pm.sample_prior_predictive(random_seed=2019)
    
    prior['y'].mean() # 0.306
    

    Obviously, this could equally be done with random=pm.Bernoulli.dist(p).random, but the above illustrates generically how one could do this with arbitrary distributions, given their inverse CDF, i.e., you only need to modify _icdf and the parameters.