I am following this demo- https://github.com/torch/demos/blob/master/linear-regression/example-linear-regression.lua
feval = function(x_new)
-- set x to x_new, if differnt
-- (in this simple example, x_new will typically always point to x,
-- so the copy is really useless)
if x ~= x_new then
x:copy(x_new)
end
-- select a new training sample
_nidx_ = (_nidx_ or 0) + 1
if _nidx_ > (#data)[1] then _nidx_ = 1 end
local sample = data[_nidx_]
local target = sample[{ {1} }] -- this funny looking syntax allows
local inputs = sample[{ {2,3} }] -- slicing of arrays.
dl_dx:zero()
local loss_x = criterion:forward(model:forward(inputs), target)
model:backward(inputs, criterion:backward(model.output, target))
return loss_x, dl_dx
end
I have a few doubts in this function
_nidx_ = (_nidx_ or 0) + 1
mean? EDIT:
My point#4 is very clear now. For those who are interested- (source- deep learning, oxford, practical 3 lab sheet)
Where is the argument x_new (or its copy x) used in the code?
x
is the tensor of parameters of your model. It was previously acquired via x, dl_dx = model:getParameters()
. model:forward()
and model:backward()
automatically use this parameter tensor. x_new
is a new set of parameters for your model and is provided by the optimizer (SGD). If it is ever different from your model's parameter tensor, your model's parameters will be set to these new parameters via x:copy(x_new)
(in-place copy of tensor's x_new
values to x
).
What does nidx = (nidx or 0) + 1 mean?
It increases the value of _nidx_
by 1
((_nidx_) + 1
) or sets it to 1
((0) + 1
) if _nidx_
was not yet defined.
what is the value of nidx when the function is first called?
It is never set before that function. Variables which were not yet set have the value nil
in lua.
Where is dl_dx updated? Ideally it should have been just after local loss_x is updated, but it isnt written explicitly
dl_dx
is the model's tensor of gradients. model:backward()
computes the gradient per parameter given a loss and adds it to the model's gradient tensor. As dl_dx
is the model's gradient tensor, its values will be increases. Notice that the gradient values are added, which is why you need to call dl_dx:zero()
(sets the values of dl_dx
in-place to zero), otherwise your gradient values would keep increasing with every call of feval
.