Search code examples
theano

In theano, why am I getting this theano.gradient.DisconnectedInputError when it's clearly connected?


Traceback (most recent call last):
  File "/home/axoren1/SmokingDataProject/Rotation Test.py", line 40, in <module>
    dJ = T.grad((R(n, t) - R(n, angles)).norm(2), t)
  File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 529, in grad
    handle_disconnected(elem)
  File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 516, in handle_disconnected
    raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Theta

What does this mean? Below is my code an an explanation for why I think this error is vacuous.

import numpy as np
import theano
import theano.tensor as T
import theano.tensor.nlinalg as Tn

n = 5

angles = 2 * np.pi * np.random.rand(n, 1)

def R(n, angles):
    sines   = T.sin(angles)
    cosines = T.cos(angles)

    def r(i, j):
        sign = -1 * -1 ** ((i + j) % 2)
        c = cosines[i - 1] * cosines[j]
        s = T.prod(sines[i:j])

        return sign * c * s

    R = T.zeros((n, n))
    for i in range(n):
        for j in range(i, n):
            T.inc_subtensor(R[i:i+1][j:j+1], r(i, j))
    for i in range(0, n - 1):
        T.inc_subtensor(R[i+1:i+2][i:i+1], sines[i])

    return R

guess = np.random.rand(n, 1)

t = T.vector("Theta")
for i in range(100):
    J = (R(n, t) - R(n, angles)).norm(2)
    dJ = T.grad((R(n, t) - R(n, angles)).norm(2), t)
    guess -= dJ.eval({t:guess})
    print J.eval({t:guess}), guess

As you can see, the Theta node is defined and used by the cost function. I don't see how the function R is discontinuous at all. Why is this breaking?


Solution

  • The problem is that you need to assign the result of the inc_subtensor calls back to R.

    Instead of

    T.inc_subtensor(R[i:i+1][j:j+1], r(i, j))
    

    and

    T.inc_subtensor(R[i+1:i+2][i:i+1], sines[i])
    

    use

    R = T.inc_subtensor(R[i:i+1][j:j+1], r(i, j))
    

    and

    R = T.inc_subtensor(R[i+1:i+2][i:i+1], sines[i])
    

    inc_subtensor is a symbolic operation that returns an object representing the symbolic result of incrementing the provided subtensor by the provided value.