Search code examples
luatorch

About feval function on tutorials/2_supervised/4_train.lua


On gihub : https://github.com/torch/tutorials/blob/master/2_supervised/4_train.lua we have a example of a script defining a training procedure. I'm interested by the construction of feval function in this script.

-- create closure to evaluate f(X) and df/dX
local feval = function(x)
      -- get new parameters
      if x ~= parameters then
         parameters:copy(x)
      end

      -- reset gradients
      gradParameters:zero()

      -- f is the average of all criterions
      local f = 0

      -- evaluate function for complete mini batch
      for i = 1,#inputs do
          -- estimate f
          local output = model:forward(inputs[i])
          local err = criterion:forward(output, targets[i])
          f = f + err

          -- estimate df/dW
          local df_do = criterion:backward(output, targets[i])
          model:backward(inputs[i], df_do)

          -- update confusion
          confusion:add(output, targets[i])
      end

      -- normalize gradients and f(X)
      gradParameters:div(#inputs)
      f = f/#inputs

      -- return f and df/dX
      return f,gradParameters
      end

I try to modify this function by suppressing the loop : for i = 1,#inputs do ... So instead of doing the forward and backward inputs by inputs (inputs[i]) I'm doing it for the whole mini batch (inputs). This really speed up the process. This is the modify script:

-- create closure to evaluate f(X) and df/dX
local feval = function(x)
      -- get new parameters
      if x ~= parameters then
         parameters:copy(x)
      end

      -- reset gradients
      gradParameters:zero()

      -- f is the average of all criterions
      local f = 0
      -- evaluate function for complete mini batch

      -- estimate f
      local output = model:forward(inputs)
      local f = criterion:forward(output, targets)

      -- estimate df/dW
      local df_do = criterion:backward(output, targets)

      -- update weight  
      model:backward(inputs, df_do)

      -- update confusion
      confusion:batchAdd(output, targets) 

      -- return f and df/dX
      return f,gradParameters
      end

But when I check in detail the return of feval (f,gradParameters) for a given mini batch we haven't the same result with the loop and without loop.

So my questions are : 1 - Why do we have this loop ? 2 - And is it possible to get the same result without this loop ?

Regards Sam

NB: I'm beginner in Torch7


Solution

  • I'm sure you noticed getting the second way to work requires a bit more than simply changing feval. In your second example, inputs needs to be a 4D tensor, rather than a table of 3D tensors (unless something has changed since I last updated). These tensors have different sizes depending on the loss criterion/model used. Whoever implemented the example must have thought the loop was an easier way to go here. In addition, ClassNLLCriterion does not seem to like batch processing (one would usually use CrossEntropy criterion to get around this).

    All of this aside though, the two methods should give the same result. The only slight difference is that the first example uses the average error/gradient, and the second uses the sum, as you can see from:

                       gradParameters:div(inputs:size(1))
                       f = f/inputs:size(1)
    

    In the second case, f and gradParameters should differ from the first only in a factor opt.batchSize. These are mathematically equivalent for optimization purposes.