Search code examples
pythonkerasgradienttensorflow2.0gradienttape

Why the gradients are unconnected in the following function?


I am implementing a customer operation whose gradients must be calculated. The following is the function:

def difference(prod,box):
    result = tf.Variable(tf.zeros((prod.shape[0],box.shape[1]),dtype=tf.float16))
    for i in tf.range(0,prod.shape[0]):
        for j in tf.range(0,box.shape[1]):
            result[i,j].assign((tf.reduce_prod(box[:,j])-tf.reduce_prod(prod[i,:]))/tf.reduce_prod(box[:,j]))
    return result

I am unable to calculate the gradients with respect to box, the tape.gradient() is returning None, here is the code I have written for calculating gradients

prod = tf.constant([[3,4,5],[4,5,6],[1,3,3]],dtype=tf.float16)
box = tf.Variable([[4,5],[5,6],[5,7]],dtype=tf.float16)
with tf.GradientTape() as tape:
    tape.watch(box)
    loss = difference(prod,box)
    print(tape.gradient(loss,box))

I am not able to find the reason for unconnected gradients. Is the result variable causing it? Kindly suggest an alternative implementation.


Solution

  • Yes, in order to calculate gradients we need a set of (differentiable) operations on your variables.

    You should re-write difference as a function of the 2 input tensors. I think (though happy to confess I am not 100% sure!) that it is the use of 'assign' that makes the gradient tape fall over.

    Perhaps something like this:

    def difference(prod, box):
      box_red = tf.reduce_prod(box, axis=0)
      prod_red = tf.reduce_prod(prod, axis=1)
      return (tf.expand_dims(box_red, 0) - tf.expand_dims(prod_red, 1)) / tf.expand_dims(box_red, 0)
    

    would get you the desired result