Search code examples
pythontensorflowtensorflow-gradient

How to assign custom gradient to TensorFlow op with multiple inputs


I'm trying to use TensorFlow's @tf.custom_gradient functionality to assign a custom gradient to a function with multiple inputs. I can put together a working setup for only one input, but not for two or more.

I've based my code on TensorFlow's custom_gradient documentation, which works just fine for one input, as in this example:

import tensorflow as tf
import os

# Suppress Tensorflow startup info
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

# Custom gradient decorator on a function,
# as described in documentation
@tf.custom_gradient
def my_identity(x):

    # The custom gradient
    def grad(dy):
        return dy

    # Return the result AND the gradient
    return tf.identity(x), grad

# Make a variable, run it through the custom op
x = tf.get_variable('x', initializer=1.)
y = my_identity(x)

# Calculate loss, make an optimizer, train the variable
loss = tf.abs(y)
opt = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = opt.minimize(loss)

# Start a TensorFlow session, initialize variables, train
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(train)

This example runs silently, then closes. No issues, no errors. The variable optimizes as expected. However, in my application, I need to do such a calculation with multiple inputs, so something of this form:

@tf.custom_gradient
def my_identity(x, z):

    def grad(dy):
        return dy

    return tf.identity(x*z), grad

Running this in place of the example (and adding another variable input to the call of my_identify) results in the following error output. Best as I can tell, the last parts of the error are from the dynamic generation of the op -- the information format matches the C++ formatting required in the op establishment (though that's about all I know about it).

Traceback (most recent call last):
  File "testing.py", line 27, in <module>
    train = opt.minimize(loss)
  File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
    grad_loss=grad_loss)
  File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 821, in _GradientsHelper
    _VerifyGeneratedGradients(in_grads, op)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 323, in _VerifyGeneratedGradients
    "inputs %d" % (len(grads), op.node_def, len(op.inputs)))
ValueError: Num gradients 2 generated for op name: "IdentityN"
op: "IdentityN"
input: "Identity"
input: "x/read"
input: "y/read"
attr {
  key: "T"
  value {
    list {
      type: DT_FLOAT
      type: DT_FLOAT
      type: DT_FLOAT
    }
  }
}
attr {
  key: "_gradient_op_type"
  value {
    s: "CustomGradient-9"
  }
}
 do not match num inputs 3

Based on other custom gradient options, I surmised that the issue was a lack of supplied gradient for the second input argument. So, I changed my function to this:

@tf.custom_gradient
def my_identity(x, z):

    def grad(dy):
        return dy

    return tf.identity(x*z), grad, grad

This results in the following more familiar error:

Traceback (most recent call last):
  File "testing.py", line 22, in <module>
    y = my_identity(x, z)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 132, in _graph_mode_decorator
    result, grad_fn = f(*args)
ValueError: too many values to unpack (expected 2)

The @custom_gradient decorator is only identifying the last returned element as a gradient. So, I tried putting the two gradients into a tuple as (grad, grad) such that there would only be "two" outputs for the function. TensorFlow rejected this too, this time because it can't call a tuple like it would a Tensor -- entirely reasonable, in hindsight.

I've fussed around with the example some more, but to no avail. No matter what I try, I can't get the custom-defined gradient to deal with multiple inputs. I'm hoping that somebody with more knowledge than I regarding custom ops and gradients will have a better idea on this -- thanks in advance for the help!


Solution

  • If we use multiple variables as input, the number of gradients return from "grad" function should be equals to number of input variables, though we maybe don't care about some of them.

    For example:

    @tf.custom_gradient
    def my_multiple(x,z):
    
    def grad(dy):
        # return two gradients, one for 'x' and one for 'z'
        return (dy*z, dy*x)
    
    return tf.identity(x*z), grad
    

    Note that the second output of "my_multiple" is a function, not a gradient tensor.