Search code examples
pythonnumpysampling

numpy - efficiently filter select random samples with stochastic constraint


I want to get N random samples with numpy, filtered so that a criteria is met. I'm unhappy with my current implementation; it is too slow for large values of N (say 100,000). How can I more efficiently filter these samples to meet the condition that an associated standard uniform random samples will be less than f/g? There has to be a faster way to implement this code.

import numpy as np
from scipy.special import gamma
import matplotlib.pyplot as plt
def f(x): return 1. / gamma(3) * x * np.exp(-1 * x)
lambd = .2
c = 1 / lambd / gamma(3) * (2./(1-lambd)) ** 2 * np.exp(-1 * (1 - lambd) * (2. / (lambd - 1)))
def g(x): return c * lambd * np.exp(-1 * lambd * x)
x = np.linspace(0, 50, 1000)
samples = []
N = 100
while len(samples) < N:
    randou = np.random.uniform(0, 1)
    randoh = c * np.random.exponential(0.2)
    if randou <= f(randoh) / g(randoh): samples.append(randoh)
plt.hist(samples, 100, normed=True, label='Simulated PDF')
plt.plot(x, f(x), label='True PDF', lw=2)
plt.xlim(0, 10)
plt.show()

I also tried to generate the samples in one go then filter those in a while loop, but I'm not sure how much faster this method actually is:

samples = np.random.uniform(0, 1, 100000)
hsamps = c * np.random.exponential(0.2, 100000)
N = 100
idx = np.array([True, False])
while len(idx[idx==True]) > 0:
    idx = samples > ( f(hsamps) / g(hsamps))
    samples[idx] = np.random.uniform(0, 1, len(idx[idx==True]))
    hsamps[idx] = c * np.random.exponential(0.2, len(idx[idx==True]))

Solution

  • To take advantage of NumPy's speed, you'll want to work with large arrays instead of individual scalars processed in a loop. So for example, you could generate N samples like this:

        randous = np.random.uniform(0, 1, size=N)
        randohs = c * np.random.exponential(0.2, size=N)
    

    and then select those that pass your filter like this:

        mask = randous <= f(randohs) / g(randohs)
        return randohs[mask]
    

    The only problem is there is no guarantee that randohs[mask] has the desired number of values (or any values at all). So we might have repeat this until we generate enough samples:

    while len(samples) < N:
        randohs = generate_samples()
        samples.extend(randohs)
    samples = samples[:N]
    

    Despite using a while-loop, this will still be far faster than generating samples one at a time.


    import numpy as np
    from scipy.special import gamma
    import matplotlib.pyplot as plt
    
    def f(x): 
        return 1. / gamma(3) * x * np.exp(-1 * x)
    def g(x): 
        return c * lambd * np.exp(-1 * lambd * x)
    
    def generate_samples(N=10**5):
        randous = np.random.uniform(0, 1, size=N)
        randohs = c * np.random.exponential(0.2, size=N)
        mask = randous <= f(randohs) / g(randohs)
        return randohs[mask]
    
    lambd = .2
    c = (1 / lambd / gamma(3) * (2./(1-lambd)) ** 2 
         * np.exp(-1 * (1 - lambd) * (2. / (lambd - 1))))
    x = np.linspace(0, 50, 1000)
    
    samples = []
    N = 10**5
    while len(samples) < N:
        randohs = generate_samples()
        samples.extend(randohs)
    samples = samples[:N]
    
    plt.hist(samples, 100, density=True, label='Simulated PDF')
    plt.plot(x, f(x), label='True PDF', lw=2)
    plt.xlim(0, 10)
    plt.show()
    

    enter image description here