Search code examples
tensorflowsamplingmontecarlotensorflow-probability

Tensorflow Probability Sampling Take Long Time


I am trying to use tfp for sampling process. draw samples from beta distribution and feed the result as probability input to draw sample from Binominal distribution. It took forever to run.

Am I supposed to run it this way or is there an optimal way?

'''

import tensorflow_probability as tfp
tfd = tfp.distributions

m = 100000 # sample size

### first sample from Beta distribution 
### and feed the result as probability to the binomial distribution sampling
s = tfd.Sample(
    tfd.Beta(2,2),
    sample_shape = m
)
phi = s.sample()

### Second sample from Binominal distribution 
### !!! it took forever to run...
s2 = tfd.Sample(
    tfd.Binomial(total_count=10, probs=phi),
    sample_shape = m
)

y = s2.sample() # not working well


### scipy code which works well:
from scipy import stats
m = 100000 # sample size
phi = stats.beta.rvs(2, 2, size = m)
y = stats.binom.rvs(10, phi, size = m)

'''


Solution

  • TFP Distributions support a concept we call "batch shape". Here, by giving probs=phi with phi.shape = [100000], you are effectively creating a "batch" of 100k Binomials. Then you're sampling 100k times from those, which is trying to create 1e10 samples, which is gonna take a while! Instead, try this:

    m = 100000
    s = tfd.Sample(
        tfd.Beta(2,2),
        sample_shape = m
    )
    phi = s.sample()
    
    ### Second sample from Binominal distribution 
    s2 = tfd.Binomial(total_count=10, probs=phi)
    
    y = s2.sample()
    

    Alternatively, use tfd.BetaBinomial!

    bb = tfd.BetaBinomial(total_count=10, concentration1=2, concentration0=2)
    bb.sample(100000)
    

    But most importantly, have a look at the example notebook talking through TFP's shape semantics: https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes

    Cheers!