Search code examples
machine-learningluatorch

Torch: How are model parameters updated?


Here is a toy model. I print the model parameters before calling backward exactly once, then print the model parameters again. The parameters are unchanged. If I add the line model:updateParameters(<learning_rate>) after calling backward, I see the parameters update.

But in the example code I've seen, for example https://github.com/torch/demos/blob/master/train-a-digit-classifier/train-on-mnist.lua, no one actually calls updateParameters. Also, it doesn't look like optim.sgd, optim.adam, or nn.StochasticGradient ever call updateParameters either. What am I missing here? How do the parameters get updated automatically? If I must call updateParameters, why do no examples do that?

require 'nn'
require 'optim'

local model = nn.Sequential()
model:add(nn.Linear(4, 1, false))
local params, grads = model:getParameters()

local criterion = nn.MSECriterion()
local inputs    = torch.randn(1, 4)
local labels    = torch.Tensor{1}

print(params)

model:zeroGradParameters()
local output = model:forward(inputs)
local loss   = criterion:forward(output, labels)
local dfdw   = criterion:backward(output, labels)
model:backward(inputs, dfdw)

-- With the line below uncommented, the parameters are updated:
-- model:updateParameters(1000)

print(params)

Solution

  • The backward() is not supposed to change parameters, it merely computes the derivatives of the error function with respect to all of the parameters of the network.

    In general the training is the sequence of the steps:

    repeat
      local output = model:forward(input) --see what model predicts
      local loss = criterion:forward(output, answer) --see how wrong it is
      local loss_grad = criterion:backward(output, answer) --see where it is the most wrong
      model:backward(input,loss_grad) --see how much each particular parameter of network is responsible for error
      model:updateParameters(learningRate) --fix the parameters based on their wrongness
      model:zeroGradParameters() --network parameters are different now, so old gradients are of no use now
    until is_user_satisfied()
    

    updateParameters implements the most simple optimization algorithm here (gradient descent). If so inclined, you may use your own function instead. In theory, you might perform explicit loops through the network storages to update their values. In practice, you usually call getParameters()

    local model_parameters,model_parameters_gradient=model:getParameters()
    

    Which yields you homogeneous tensors of all the values and the gradients. These tensors are views inside the network, so changes in them affect the network. You may not know which point in the network corresponds to which value, but most optimizers do not care about that.

    The demo of optim.sgd usage is as follows:

    optim.sgd(
       function_to_return_error_and_its_gradients, 
       model_parameters,
       optimizer_special_settings)
    

    The specifics are covered in demo, but here it is relevant that optimizer receives the model_parameters as a parameter which gives it write access to network. And it is not explicitly stated in the documentation, but in the source code it is seen, that the optimizer changes the values of its input tensor (also, note that it is returning the same tensor it received).