Search code examples
reinforcement-learning

Any example code of REINFORCE algorithm proposed by Williams?


Does any one know any example code of an algorithm Ronald J. Williams proposed in
A class of gradient-estimating algorithms for reinforcement learning in neural networks


Solution

  • Yes, do a search on GitHub, and you will get a whole bunch of results:

    GitHub: WILLIAMS+REINFORCE

    The most popular ones use this code (in Python):

    __author__ = 'Thomas Rueckstiess, [email protected]'
    
    from pybrain.rl.learners.directsearch.policygradient import PolicyGradientLearner
    from scipy import mean, ravel, array
    
    
    class Reinforce(PolicyGradientLearner):
    """ Reinforce is a gradient estimator technique by Williams (see
        "Simple Statistical Gradient-Following Algorithms for
        Connectionist Reinforcement Learning"). It uses optimal
        baselines and calculates the gradient with the log likelihoods
        of the taken actions. """
    
    def calculateGradient(self):
        # normalize rewards
        # self.ds.data['reward'] /= max(ravel(abs(self.ds.data['reward'])))
    
        # initialize variables
        returns = self.dataset.getSumOverSequences('reward')
        seqidx = ravel(self.dataset['sequence_index'])
    
        # sum of sequences up to n-1
        loglhs = [sum(self.loglh['loglh'][seqidx[n]:seqidx[n + 1], :]) for n in range(self.dataset.getNumSequences() - 1)]
        # append sum of last sequence as well
        loglhs.append(sum(self.loglh['loglh'][seqidx[-1]:, :]))
        loglhs = array(loglhs)
    
        baselines = mean(loglhs ** 2 * returns, 0) / mean(loglhs ** 2, 0)
        # TODO: why gradient negative?
        gradient = -mean(loglhs * (returns - baselines), 0)
    
        return gradient