so here's my solver.prototxt:
net: "models/dcnf-fcsp-alexnet/train_val.prototxt"
#test_iter: 1000
#test_interval: 1000
test_initialization: false
base_lr: 0.0001
lr_policy: "step"
gamma: 0.01
stepsize: 50000
display: 20
max_iter: 1000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "/data/lesi/dcnf-fcsp-alexnet/"
type: "Adam"
solver_mode: GPU
It cleary shoud set the type to Adam. Yet when I run a training with this solver my supervisor pointed out that it looks like it's using SGD (since it says sdg_solver.cpp):
I0728 16:18:59.490665 27998 sgd_solver.cpp:106] Iteration 41860, lr = 0.0001
I0728 16:19:26.414223 27998 solver.cpp:228] Iteration 41880, loss = 1.45618
I0728 16:19:26.414342 27998 solver.cpp:244] Train net output #0: loss = 1.45618 (* 1 = 1.45618 loss)
I0728 16:19:26.414355 27998 sgd_solver.cpp:106] Iteration 41880, lr = 0.0001
I0728 16:19:53.348322 27998 solver.cpp:228] Iteration 41900, loss = 1.44106
I0728 16:19:53.348362 27998 solver.cpp:244] Train net output #0: loss = 1.44106 (* 1 = 1.44106 loss)
Is this just some console output confusion or am I actually using SGD? If so, how come it won't switch to Adam? I don't see what other step is required here ...
"Adam"
is a special case of "SGD" solver: Using minibatches, each iteration gives a stochastic estimate of the local gradient. The different solver types differs in the way they use this stochastic estimate to update the weights.
Look at your 'solverstate'
and 'caffemodel'
snapshots you'll notice that 'solverstate'
takes twice the disk space than 'caffemodel'
- this is because "Adam"
solver stores for each trainable parameter mean and std ("moment"). Had you used plain "SGD" solver your 'caffemodel'
and 'solverstate'
would have the same file size.