Search code examples
pythontensorflowcalculus

Calculating tensorflow gradients


I am confused by the example in the tensorflow gradient documentation for computing the gradient.

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b])

with tf.Session() as sess:
    print(sess.run(g))

which gives [3.0, 1.0]

I feel like I am really missing something obvious but if a is essentially 0 then b and therefore a+b =0 . So how does differentiating zero with respect to a and b give you something like [3.0, 1.0].

I believe I am misunderstanding tensorflows structure/syntax here.


Solution

  • For comparison, consider the real-valued function f : RR of one real variable, given by f(x) = 10 x. Here, f'(x) = 10, regardless of the value of x, so in particular f'(0) = 10.

    Similarly, as explained in the tutorial, more or less by definition, the total derivative of (a, b) ↦ a + b for b(a) = 2 a is (3, 1), which is independent of a.

    For a less trivial example, let us consider

    a = tf.constant(5.)
    b = 2 * a
    g = tf.gradients(a**3 + 2*b**2, [a, b])
    
    with tf.Session() as sess:
        print(sess.run(g))
    

    Here, the total derivative with respect to a is the derivative of aa³ + 2(2 a)² = a³ + 8 a² which becomes a ↦ 3 a² + 16 a, while the derivative with respect to b is a ↦ 4 b(a) = 8 a. Thus, at a = 5, we expect the result to be (3 · 5² + 16 · 5, 8 · 5) = (155, 40), and running the code that's exactly what you get.