graph tensorflow reinforcement-learning gradient

Tensorflow: tf.gradients between different paths of the graph

I am working on a DDPG implementation, which requires the computation of one network's (below: critic) gradients with respect to another network's (below: actor) output. My code already makes use of queues instead of feed dicts for the most part, but I could not do so for this specific part yet:

import tensorflow as tf
tf.reset_default_graph()

states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))

actor = states * 1
critic = states * 1 + actions

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    act = sess.run(actor, {states: [1.]})
    print(act)  # -> [1.]
    cri = sess.run(critic, {states: [1.], actions: [2.]})
    print(cri)  # -> [3.]
    grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
    print(grad1)  # -> [[1.]]
    grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
    print(grad2)  # -> TypeError: Fetch argument has invalid type 'NoneType'

grad1 here computes the gradients w.r.t. to the fed-in actions, which were previously computed by the actor. grad2 should do the same, but directly inside of the graph without the need for feeding the actions back in but by evaluating actor directly. The problem is that grads_direct is None:

print(grads_direct)  # [None]

How can I achieve this? Is there a dedicated "evaluate this tensor" operation I could make use of? Thanks!

Solution

In your example you don't use actor to compute critic so the gradient is None.

You should do:

actor = states * 1
critic = actor + actions  # change here

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)