Search code examples
pythonmontecarlomarkov-chainsmcmc

Metropolis-Hastings accept-reject implementation


I've been reading about the Metropolis-Hastings (MH) algorithm. Theoretically, I understood how the algorithm works. Now, I am trying to implement the MH algorithm using python.

I came across the following notebook. It suits exactly my problem since I want to fit my data by a straight line taking into consideration the measurement errors on my data. I am going to paste the code I am finding difficulties to understand:

# initial m, b
m,b = 2, 0

# step sizes
mstep, bstep = 0.1, 10.

# how many steps?
nsteps = 10000

chain = []
probs = []
naccept = 0

print 'Running MH for', nsteps, 'steps'

# First point:
L_old    = straight_line_log_likelihood(x, y, sigmay, m, b)
p_old    = straight_line_log_prior(m, b)
prob_old = np.exp(L_old + p_old)

for i in range(nsteps):
    # step
    mnew = m + np.random.normal() * mstep
    bnew = b + np.random.normal() * bstep

    # evaluate probabilities
    # prob_new = straight_line_posterior(x, y, sigmay, mnew, bnew)

    L_new    = straight_line_log_likelihood(x, y, sigmay, mnew, bnew)
    p_new    = straight_line_log_prior(mnew, bnew)
    prob_new = np.exp(L_new + p_new)

    if (prob_new / prob_old > np.random.uniform()):
        # accept
        m = mnew
        b = bnew
        L_old = L_new
        p_old = p_new
        prob_old = prob_new
        naccept += 1
    else:
        # Stay where we are; m,b stay the same, and we append them
        # to the chain below.
        pass

    chain.append((b,m))
    probs.append((L_old,p_old))
print 'Acceptance fraction:', naccept/float(nsteps)

The code is simple and easy, but I have difficulties in understanding how the MH is being implemented.

My question is in the chain.append (the third line from the bottom). The author is appending m and b whether they were accepted or rejected. Why? Shouldn't he append only the accepted points?


Solution

  • The following R code demonstrates why it is important to capture the rejected case:

    # 20 samples from 0 or 1. 1 has an 80% probability of being chosen.
    the.population <- sample(c(0,1), 20, replace = TRUE, prob=c(0.2, 0.8))
    
    # Create a new sample that only catches changes
    the.sample <- c(the.population[1])
    
    # Loop though the.population,
    # but only copy the.population to the.sample if the value changes
    for( i in 2:length(the.population))
    {
      if(the.population[i] != the.population[i-1])
        the.sample <- append(the.sample, the.population[i])
    }
    

    When this code runs, the.population gets 20 values, for example:

    0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1
    

    The probability of a 1 in this population is 16/20 or 0.8. Exactly the probability we expected...

    The sample, on the other hand, which only records changes, looks like this:

    0 1 0 1 0 1
    

    The probability of a 1 in the sample is 3/6 or 0.5.

    We are trying to build a distribution, rejecting the new values means that the old values are more likely than the new values. That needs to be captured so our distribution is correct.