Search code examples
pythonperformancemachine-learningrandommcmc

why is my python implementation of metropolis algorithm (mcmc) so slow?


I'm trying to implement the Metropolis algorithm (a simpler version of the Metropolis-Hastings algorithm) in Python.

Here is my implementation:

def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
    """
    Metropolis Algorithm using a Gaussian proposal distribution.
    p: distribution that we want to sample from (can be unnormalized)
    z0: Initial sample
    sigma: standard deviation of the proposal normal distribution.
    n_samples: number of final samples that we want to obtain.
    burn_in: number of initial samples to discard.
    m: this number is used to take every mth sample at the end
    """
    # List of samples, check feasibility of first sample and set z to first sample
    sample_list = [z0]
    _ = p(z0) 
    z = z0
    # set a counter of samples for burn-in
    n_sampled = 0

    while len(sample_list[::m]) < n_samples:
        # Sample a candidate from Normal(mu, sigma),  draw a uniform sample, find acceptance probability
        cand = np.random.normal(loc=z, scale=sigma)
        u = np.random.rand()
        try:
            prob = min(1, p(cand) / p(z))
        except (OverflowError, ValueError) as error:
            continue
        n_sampled += 1

        if prob > u:
            z = cand  # accept and make candidate the new sample

        # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)

    # Finally want to take every Mth sample in order to achieve independence
    return np.array(sample_list)[::m]

When I try to apply my algorithm to an exponential function it takes very little time. However, when I try it on a t-distribution it takes ages, considering that it's not doing that many calculations. This is how you can replicate my code:

t_samples = Metropolis_Gaussian(pdf_t, 3, 1, 1000, 1000, m=100)
plt.hist(t_samples, density=True, bins=15, label='histogram of samples')
x = np.linspace(min(t_samples), max(t_samples), 100)
plt.plot(x, pdf_t(x), label='t pdf')
plt.xlim(min(t_samples), max(t_samples))
plt.title("Sampling t distribution via Metropolis")
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend()

This code takes quite a long time to run and I'm not sure why. In my code for Metropolis_Gaussian, I am trying to improve efficiency by

  1. Not adding to the list repeated samples
  2. Not recording burn-in samples

The function pdf_t is defined as follows

from scipy.stats import t
def pdf_t(x, df=10):
    return t.pdf(x, df=df)

Solution

  • I answered a similar question previously. Many of the things I mentioned there (not computing the current likelihood at every iteration, pre-computing the random innovations etc.) can be used here.

    Other improvements for your implementation would be not using a list to store your samples. Instead you should preallocate the memory for the samples and store them as an array. Something like samples = np.zeros(n_samples) would be more efficient than appending to a list at every iteration.

    You already mentioned that you tried to improve efficiency by not recording the burn-in samples. This is a good idea. You can also do a similar trick with the thinning by only recording every m-th sample since you discard these in your return statement with np.array(sample_list)[::m] anyway. You would do this by changing:

       # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)
    

    to

        # Only keep iterations after burn-in and for every m-th iteration
        if n_sampled > burn_in and n_sampled % m == 0:
            samples[(n_sampled - burn_in) // m] = z
    

    It's also worth noting that you don't need to compute min(1, p(cand) / p(z)) and can get away with only computing p(cand) / p(z). I realise that formally the min is necessary (to ensure the probabilities are bounded between 0 and 1). But, computationally, we don't need the min, since if p(cand) / p(z) > 1 then p(cand) / p(z) is always greater than u.

    Putting this all together as well as precomputing the random innovations, the acceptance probability u and only computing the likelihoods when you really have to I came up with:

    def my_Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
        # Pre-allocate memory for samples (much more efficient than using append)
        samples = np.zeros(n_samples)
    
        # Store initial value
        samples[0] = z0
        z = z0
        # Compute the current likelihood
        l_cur = p(z)
    
        # Counter
        iter = 0
        # Total number of iterations to make to achieve desired number of samples
        iters = (n_samples * m) + burn_in
    
        # Sample outside the for loop
        innov = np.random.normal(loc=0, scale=sigma, size=iters)
        u = np.random.rand(iters)
    
        while iter < iters:
            # Random walk innovation on z
            cand = z + innov[iter]
    
            # Compute candidate likelihood
            l_cand = p(cand)
    
            # Accept or reject candidate
            if l_cand / l_cur > u[iter]:
                z = cand
                l_cur = l_cand
    
            # Only keep iterations after burn-in and for every m-th iteration
            if iter > burn_in and iter % m == 0:
                samples[(iter - burn_in) // m] = z
    
            iter += 1
    
        return samples
    

    If we take a look at performance we find this implementation to be 2x faster than the original, which isn't bad for a few minor changes.

    In [1]: %timeit Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
    205 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [2]: %timeit my_Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
    102 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)