I have bunch of questions about the way regularization and biased are working in caffe.
First, by default biased exist in the network, is it right? Or, I need to ask caffe to add them?
Second, when it obtains the loss value, it does not consider the regularization. is it right? I mean the loss just contains the loss function value. As I understood, it just considers regularization in the gradient calculation. Is it right?
Third, when caffe obtains the gradient, does it consider the biased value in the regularization as well? Or does it just consider the weight of the network in the regularization?
Thanks in advance,
Afshin
For your 3 questions, my answer is:
ConvolutionParameter
and InnerProductParameter
in caffe.proto
, the bias_term
's default value is true
, which means the convolution/innerproduct
layer in the network will has bias by default.net_->ForwardBackward()
and in fact in ApplyUpdate()
function, where updating the network parameters happens.Take a convolution layer in a network for example:
layer {
name: "SomeLayer"
type: "Convolution"
bottom: "data"
top: "conv"
#for weights
param {
lr_mult: 1
decay_mult: 1.0 #coefficient of regularization for weights
#default is 1.0, here is for the sake of clarity
}
#for bias
param {
lr_mult: 2
decay_mult: 1.0 #coefficient of regularization for bias
#default is 1.0, here is for the sake of clarity
}
... #left
}
The answer for this question is: when caffe obtains the gradient, the solver will consider the biased value in the regularization only if the 2 variables: the second decay_mult
above and the weight_decay
in the solver.prototxt
are both larger than zero.
Details can be found in functoin void SGDSolver::Regularize().
Hope this will help you.