Expecting a contiguous tensor error with nn.Sum

I have a 2x16x3x10x10 tensor that I feed into my network. My network has two parts that work in parallel. The first part takes the 16x3x10x10 matrix and computes the sum over the last two dimensions, returning a 16x3 tensor. The second part is a convolutional neural network that produces a 16x160 tensor. Whenever I try to run this model, I get the following error:

...903/nTorch/Torch7/install/share/lua/5.1/torch/Tensor.lua:457: expecting a contiguous tensor
stack traceback:
[C]: in function 'assert'
...903/nTorch/Torch7/install/share/lua/5.1/torch/Tensor.lua:457: in function 'view'
...8/osu7903/nTorch/Torch7/install/share/lua/5.1/nn/Sum.lua:26: in function 'updateGradInput'
...03/nTorch/Torch7/install/share/lua/5.1/nn/Sequential.lua:40: in function 'updateGradInput'
...7903/nTorch/Torch7/install/share/lua/5.1/nn/Parallel.lua:52: in function 'updateGradInput'
...su7903/nTorch/Torch7/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
...03/nTorch/Torch7/install/share/lua/5.1/nn/Sequential.lua:73: in function 'backward'
./train_v2_with_batch.lua:144: in function 'opfunc'
...su7903/nTorch/Torch7/install/share/lua/5.1/optim/sgd.lua:43: in function 'sgd'
./train_v2_with_batch.lua:160: in function 'train'
run.lua:93: in main chunk
[C]: in function 'dofile'
...rch/Torch7/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00405800

Here is the relevant part of the model:

local first_part = nn.Parallel(1,2)
local CNN = nn.Sequential()

local sums = nn.Sequential()
sums:add(nn.Sum(3))
sums:add(nn.Sum(3))
first_part:add(sums)

-- stage 1: conv+max
CNN:add(nn.SpatialConvolutionMM(nfeats, convDepth_L1,receptiveFieldWidth_L1,receptiveFieldHeight_L1))  
-- Since the default stride of the receptive field is 1, then 
-- (assuming receptiveFieldWidth_L1 = receptiveFieldHeight_L1 = 3)  the number of receptive fields is (10-3+1)x(10-3+1) or 8x8
-- so the output volume is (convDepth_L1 X 8 X 8) or 10 x 8 x 8

--CNN:add(nn.Threshold())
CNN:add(nn.ReLU())
CNN:add(nn.SpatialMaxPooling(poolsize,poolsize,poolsize,poolsize)) 
-- if poolsize=2, then the output of this is 10x4x4

CNN:add(nn.Reshape(convDepth_L1*outputWdith_L2*outputWdith_L2,true))
first_part:add(CNN)

The code works when the input tensor is 2x1x3x10x10, but not when the tensor is 2x16x3x10x10.

Edit: I only just realized that this happens when I do model:backward and not model:forward. Here is the relevant code:

local y = model:forward(x)
local E = loss:forward(y,yt)

-- estimate df/dW
local dE_dy = loss:backward(y,yt)
print(dE_dy)
model:backward(x,dE_dy)

x is a 2x16x3x10x10 tensor and dE_dy is 16x2.

Solution

This is a flaw in torch.nn library. To perform a backward step, nn.Parallel splits gradOutput it receives from higher module into pieces and sends them to its parallel submodules. Splitting are done effectively without copying memory, and thus those pieces are non-contiguous (unless you split on the 1st dimension).

local first_part = nn.Parallel(1,2)
--                               ^
--                 Merging on the 2nd dimension; 
--       Chunks of splitted gradOutput will not be contiguous

The problem is that nn.Sum cannot work with non-contiguous gradOutput. I haven't got a better idea than to make changes to it:

Sum_nc, _ = torch.class('nn.Sum_nc', 'nn.Sum')
function Sum_nc:updateGradInput(input, gradOutput)
    local size = input:size()
    size[self.dimension] = 1
    -- modified code:
    if gradOutput:isContiguous() then
        gradOutput = gradOutput:view(size) -- doesn't work with non-contiguous tensors
    else
        gradOutput = gradOutput:resize(size) -- slower because of memory reallocation and changes gradOutput
        -- gradOutput = gradOutput:clone():resize(size) -- doesn't change gradOutput; safer and even slower
    end
    --
    self.gradInput:resizeAs(input)
    self.gradInput:copy(gradOutput:expandAs(input))
    return self.gradInput
end 

[...]

sums = nn.Sequential()
sums:add(nn.Sum_nc(3)) -- <- will use torch.view
sums:add(nn.Sum_nc(3)) -- <- will use torch.resize