The following is a code snippet which, given a state
, generates an action
from a state-dependent distribution (prob_policy
). Then the weights of the graph are updated according to a loss which is -1 times the probability of that action being selected. In the following example, both the mean (mu
) and the covariance (sigma
) of the MultivariateNormal are trainable/learned.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
action = prob_policy.sample()
picked_action_prob = prob_policy.prob(action)
loss = -tf.log(picked_action_prob)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)
However, when I replace this line
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
with the following line (and comment out the lines which generate the sigma layer and squeeze it)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
I get the following error
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'fully_connected/weights:0' shape=(2, 2) dtype=float32_ref>", "<tf.Variable 'fully_connected/biases:0' shape=(2,) dtype=float32_ref>"] and loss Tensor("Neg:0", shape=(), dtype=float32).
I don't understand why this is happening. Shouldn't it still be able to take the gradient with respect to the weights in the mu
layer? Why does making the covariance of the distribution constant suddenly make it non-differentiable?
System Details:
There is an issue caused by some caching we do inside of MVNDiag (and other subclasses of TransformedDistribution) for invertibility.
If you do a + 0
(as a workaround) after your .sample() the gradient will work.
Also I'd suggest using dist.log_prob(..)
instead of tf.log(dist.prob(..))
. Better numerics.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
action = prob_policy.sample() + 0
loss = -prob_policy.log_prob(action)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)