I have a graph as follows, where the input x has two paths to reach y. They are combined with a gModule that uses cMulTable. Now if I do gModule:backward(x,y), I get a table of two values. Do they correspond to the error derivative derived from the two paths?
But since path2 contains other nn layers, I suppose I need to derive the derivates in this path in a stepwise fashion. But why did I get a table of two values for dy/dx?
To make things clearer, code to test this is as follows:
input1 = nn.Identity()()
input2 = nn.Identity()()
score = nn.CAddTable()({nn.Linear(3, 5)(input1),nn.Linear(3, 5)(input2)})
g = nn.gModule({input1, input2}, {score}) #gModule
mlp = nn.Linear(3,3) #path2 layer
x = torch.rand(3,3)
x_p = mlp:forward(x)
result = g:forward({x,x_p})
error = torch.rand(result:size())
gradient1 = g:backward(x, error) #this is a table of 2 tensors
gradient2 = g:backward(x_p, error) #this is also a table of 2 tensors
So what is wrong with my steps?
P.S, perhaps I have found out the reason because g:backward({x,x_p}, error) results in the same table. So I guess the two values stand for dy/dx and dy/dx_p respectively.
I think you simply made a mistake constructing your gModule
. gradInput
of every nn.Module
has to have exactly the same structure as its input
- that is the way backprop works.
Here's an example how to create a module like yours using nngraph
require 'torch'
require 'nn'
require 'nngraph'
function CreateModule(input_size)
local input = nn.Identity()() -- network input
local nn_module_1 = nn.Linear(input_size, 100)(input)
local nn_module_2 = nn.Linear(100, input_size)(nn_module_1)
local output = nn.CMulTable()({input, nn_module_2})
-- pack a graph into a convenient module with standard API (:forward(), :backward())
return nn.gModule({input}, {output})
input = torch.rand(30)
my_module = CreateModule(input:size(1))
output = my_module:forward(input)
criterion_err = torch.rand(output:size())
gradInput = my_module:backward(input, criterion_err)
As I said, gradInput
of every nn.Module
has to have exactly the same structure as its input
. So, if you define your module as nn.gModule({input1, input2}, {score})
, your gradOutput
(the result of the backward pass) will be a table of gradients w.r.t. input1
and input2
which in your case are x
and x_p
The only question remains: why on Earth don't you get an error when call:
gradient1 = g:backward(x, error)
gradient2 = g:backward(x_p, error)
An exception must be raised because the first argument must be not a tensor but a table of two tensors. Well, most (perhaps all) of torch modules during calculating :backward(input, gradOutput)
don't use input
argument (they usually store a copy of input
from the last :forward(input)
call). In fact, this argument is so useless that modules don't even bother themselves to verify it.