Search code examples
pythontheanoelementwise-operations

theano (python): elementwise gradient


I'm trying to perform elementwise gradient with

e.g.,

output-f(x): 5 by 1 vector,

with respect to input-X: 5 by 1 vector

I can do this like,

 import theano
 import theano.tensor as T

 X = T.vector('X')   

 f = X*3    

 [rfrx, []] = theano.scan(lambda j, f,X : T.grad(f[j], X), sequences=T.arange(X.shape[0]), non_sequences=[f,X])

 fcn_rfrx = theano.function([X], rfrx)

 fcn_rfrx(np.ones(5,).astype(float32))

and the result is

array([[ 3.,  0.,  0.,  0.,  0.],
       [ 0.,  3.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.],
       [ 0.,  0.,  0.,  3.,  0.],
       [ 0.,  0.,  0.,  0.,  3.]], dtype=float32)

but since it's not efficient, i want to get 5 by 1 vector as a result

by doing something like..

 [rfrx, []] = theano.scan(lambda j, f,X : T.grad(f[j], X[j]), sequences=T.arange(X.shape[0]), non_sequences=[f,X])

which doesn't work.

Is there any way of do this? (sorry for bad format..I'm new here and learning)


(I added more clear example):

given input vector: x[1], x[2], ..., x[n]

and output vector: y[1], y[2], .., y[n],

where y[i] = f(x[i]).

I want the result of

df(x[i])/dx[i] only

and not the

df(x[i])/dx[j] for (i<>j)

, for computational efficiency (n is number of data > 10000)


Solution

  • You are looking for theano.tensor.jacobian.

    import theano
    import theano.tensor as T
    
    x = T.fvector()
    p = T.as_tensor_variable([(x ** i).sum() for i in range(5)])
    
    j = T.jacobian(p, x)
    
    f = theano.function([x], [p, j])
    

    Now evaluating yields

    In [31]: f([1., 2., 3.])
    Out[31]: 
    [array([  3.,   6.,  14.,  36.,  98.], dtype=float32),
     array([[   0.,    0.,    0.],
            [   1.,    1.,    1.],
            [   2.,    4.,    6.],
            [   3.,   12.,   27.],
            [   4.,   32.,  108.]], dtype=float32)]
    

    If you are interested in only one, or a few partial derivatives, you can obtain only them also. It would be necessary to take a close look at the theano optimization rules to be able to see how much more efficient this gets (a benchmark is a first test). (It is possible that already indexing into the gradient makes it clear to theano that it does not need to calculate the rest).

    x = T.fscalar()
    y = T.fvector()
    z = T.concatenate([x.reshape((1,)), y.reshape((-1,))])
    
    e = (z ** 2).sum()
    g = T.grad(e, wrt=x)
    
    ff = theano.function([x, y], [e, g])