Search code examples
luamachine-learningneural-networkbackpropagationtorch

How to compute the gradient of loss with repect to an arbitrary layer/weight in Torch?


I'm transiting from Theano to Torch. So please bear with me. In Theano, it was kind of straight-forward to compute the gradients of loss function w.r.t even a specific weight. I wonder, how can one do this in Torch?

Assume we have the following code which generates some data/labels and defines a model :

t = require 'torch'
require 'nn'
require 'cunn'
require 'cutorch'



-- Generate random labels
function randLabels(nExamples, nClasses)
    -- nClasses: number of classes
    -- nExamples: number of examples
    label = {}
    for i=1, nExamples do
        label[i] = t.random(1, nClasses)
    end
    return t.FloatTensor(label)
end

inputs = t.rand(1000, 3, 32, 32) -- 1000 samples, 3 color channels
inputs = inputs:cuda()
labels = randLabels(inputs:size()[1], 10)
labels = labels:cuda()

net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2, 2, 2, 2))
net:add(nn.View(6*14*14))
net:add(nn.Linear(6*14*14, 300))
net:add(nn.ReLU())
net:add(nn.Linear(300, 10))
net = net:cuda()

-- Loss
criterion = nn.CrossEntropyCriterion()
criterion = criterion:cuda()
forwardPass = net:forward(inputs)
net:zeroGradParameters()
dEd_WeightsOfLayer1 -- How to compute this?



forwardPass = nil
net = nil
criterion = nil
inputs = nil
labels = nil

collectgarbage()

How can I compute the gradient w.r.t weights of convolutinal layer?


Solution

  • Okay, I found the answer (thanks to alban desmaison on Torch7 Google group). The code in the question has a bug and does not work. So I re-write the code. Here's how you can get the gradients with respect to each node/parameter:

    t = require 'torch'
    require 'cunn'
    require 'nn'
    require 'cutorch'
    
    
    
    -- A function to generate some random labels
    function randLabels(nExamples, nClasses)
        -- nClasses: number of classes
        -- nExamples: number of examples
        label = {}
        for i=1, nExamples do
            label[i] = t.random(1, nClasses)
        end
        return t.FloatTensor(label)
    end
    
    -- Declare some variables
    nClass = 10
    kernelSize = 5
    stride = 2
    poolKernelSize = 2
    nData = 100
    nChannel = 3
    imageSize = 32
    
    -- Generate some [random] data
    data = t.rand(nData, nChannel, imageSize, imageSize) -- 100 Random images with 3 channels
    data = data:cuda() -- Transfer to the GPU (remove this line if you're not using GPU)
    label = randLabels(data:size()[1], nClass)
    label = label:cuda() -- Transfer to the GPU (remove this line if you're not using GPU)
    
    -- Define model
    net = nn.Sequential()
    net:add(nn.SpatialConvolution(3, 6, 5, 5))
    net:add(nn.ReLU())
    net:add(nn.SpatialMaxPooling(poolKernelSize, poolKernelSize, stride, stride))
    net:add(nn.View(6*14*14))
    net:add(nn.Linear(6*14*14, 350))
    net:add(nn.ReLU())
    net:add(nn.Linear(350, 10))
    net = net:cuda() -- Transfer to the GPU (remove this line if you're not using GPU)
    
    criterion = nn.CrossEntropyCriterion()
    criterion = criterion:cuda() -- Transfer to the GPU (remove this line if you're not using GPU)
    
    -- Do forward pass and get the gradient for each node/parameter:
    
    net:forward(data) -- Do the forward propagation
    criterion:forward(net.output, label) -- Computer the overall negative log-likelihood error
    criterion:backward(net.output, label); -- Don't forget to put ';'. Otherwise you'll get everything printed on the screen
    net:backward(data, criterion.gradInput); -- Don't forget to put ';'. Otherwise you'll get everything printed on the screen
    
    -- Now you can access the gradient values
    
    layer1InputGrad = net:get(1).gradInput
    layer1WeightGrads = net:get(1).gradWeight
    
    net = nil
    data = nil
    label = nil
    criterion = nil
    

    Copy and paste the code and it works like charm :)