Search code examples
luaneural-networktorchbackpropagation

torch backward through gModule


I have a graph as follows, where the input x has two paths to reach y. They are combined with a gModule that uses cMulTable. Now if I do gModule:backward(x,y), I get a table of two values. Do they correspond to the error derivative derived from the two paths? enter image description here

But since path2 contains other nn layers, I suppose I need to derive the derivates in this path in a stepwise fashion. But why did I get a table of two values for dy/dx?

To make things clearer, code to test this is as follows:

input1 = nn.Identity()()
input2 = nn.Identity()()
score = nn.CAddTable()({nn.Linear(3, 5)(input1),nn.Linear(3, 5)(input2)})
g = nn.gModule({input1, input2}, {score})  #gModule

mlp = nn.Linear(3,3) #path2 layer

x = torch.rand(3,3)
x_p = mlp:forward(x)
result = g:forward({x,x_p})
error = torch.rand(result:size())
gradient1 = g:backward(x, error)  #this is a table of 2 tensors
gradient2 = g:backward(x_p, error)  #this is also  a table of 2 tensors

So what is wrong with my steps?

P.S, perhaps I have found out the reason because g:backward({x,x_p}, error) results in the same table. So I guess the two values stand for dy/dx and dy/dx_p respectively.


Solution

  • I think you simply made a mistake constructing your gModule. gradInput of every nn.Module has to have exactly the same structure as its input - that is the way backprop works.

    Here's an example how to create a module like yours using nngraph:

    require 'torch'
    require 'nn'
    require 'nngraph'
    
    function CreateModule(input_size)
        local input = nn.Identity()()   -- network input
    
        local nn_module_1 = nn.Linear(input_size, 100)(input)
        local nn_module_2 = nn.Linear(100, input_size)(nn_module_1)
    
        local output = nn.CMulTable()({input, nn_module_2})
    
        -- pack a graph into a convenient module with standard API (:forward(), :backward())
        return nn.gModule({input}, {output})
    end
    
    
    input = torch.rand(30)
    
    my_module = CreateModule(input:size(1))
    
    output = my_module:forward(input)
    criterion_err = torch.rand(output:size())
    
    gradInput = my_module:backward(input, criterion_err)
    print(gradInput)
    

    UPDATE

    As I said, gradInput of every nn.Module has to have exactly the same structure as its input. So, if you define your module as nn.gModule({input1, input2}, {score}), your gradOutput (the result of the backward pass) will be a table of gradients w.r.t. input1 and input2 which in your case are x and x_p.

    The only question remains: why on Earth don't you get an error when call:

    gradient1 = g:backward(x, error) 
    gradient2 = g:backward(x_p, error)
    

    An exception must be raised because the first argument must be not a tensor but a table of two tensors. Well, most (perhaps all) of torch modules during calculating :backward(input, gradOutput) don't use input argument (they usually store a copy of input from the last :forward(input) call). In fact, this argument is so useless that modules don't even bother themselves to verify it.