tensorflow gradient-descent reinforcement-learning

Eligibility traces in TensorFlow

According to Sutton's book - Reinforcement Learning: An Introduction, the update equation of Network weights is given by:

where e_t is the eligibility trace. This is similar to a Gradient Descent update with an extra e_t.
Can this eligibility trace be included in the tf.train.GradientDescentOptimizer in TensorFlow?

Solution

Here's a simple example of using tf.contrib.layers.scale_gradient to do elementwise multiplication of gradients. In the forward pass it's just an identity op, and in the backward pass it multiplies gradients by its second argument.

import tensorflow as tf

with tf.Graph().as_default():
  some_value = tf.constant([0.,0.,0.])
  scaled = tf.contrib.layers.scale_gradient(some_value, [0.1, 0.2, 0.3])
  (some_value_gradient,) = tf.gradients(tf.reduce_sum(scaled), some_value)
  with tf.Session():
    print(scaled.eval())
    print(some_value_gradient.eval())

Prints:

[ 0.  0.  0.]
[ 0.1         0.2         0.30000001]