Search code examples
tensorflowgradient-descentreinforcement-learning

Eligibility traces in TensorFlow


According to Sutton's book - Reinforcement Learning: An Introduction, the update equation of Network weights is given by:

theta = theta + alpha * delta * e

where et is the eligibility trace. This is similar to a Gradient Descent update with an extra et.
Can this eligibility trace be included in the tf.train.GradientDescentOptimizer in TensorFlow?


Solution

  • Here's a simple example of using tf.contrib.layers.scale_gradient to do elementwise multiplication of gradients. In the forward pass it's just an identity op, and in the backward pass it multiplies gradients by its second argument.

    import tensorflow as tf
    
    with tf.Graph().as_default():
      some_value = tf.constant([0.,0.,0.])
      scaled = tf.contrib.layers.scale_gradient(some_value, [0.1, 0.2, 0.3])
      (some_value_gradient,) = tf.gradients(tf.reduce_sum(scaled), some_value)
      with tf.Session():
        print(scaled.eval())
        print(some_value_gradient.eval())
    

    Prints:

    [ 0.  0.  0.]
    [ 0.1         0.2         0.30000001]