Search code examples

Does caffe multiply the regularization parameter to biased?

I have bunch of questions about the way regularization and biased are working in caffe.

First, by default biased exist in the network, is it right? Or, I need to ask caffe to add them?

Second, when it obtains the loss value, it does not consider the regularization. is it right? I mean the loss just contains the loss function value. As I understood, it just considers regularization in the gradient calculation. Is it right?

Third, when caffe obtains the gradient, does it consider the biased value in the regularization as well? Or does it just consider the weight of the network in the regularization?

Thanks in advance,



  • For your 3 questions, my answer is:

    1. Yes. Bias do exist in the network by default. For example, in the ConvolutionParameter and InnerProductParameter in caffe.proto, the bias_term's default value is true, which means the convolution/innerproduct layer in the network will has bias by default.
    2. Yes. The loss value obtained by loss layer does not contain the value of regularization term. And it just consider the regularization after calling the function net_->ForwardBackward() and in fact in ApplyUpdate() function, where updating the network parameters happens.
    3. Take a convolution layer in a network for example:

      layer {
        name: "SomeLayer"
        type: "Convolution"
        bottom: "data"
        top: "conv"
        #for weights
        param {
          lr_mult: 1 
          decay_mult: 1.0 #coefficient of regularization for weights
                          #default is 1.0, here is for the sake of clarity  
        #for bias
        param {
          lr_mult: 2
          decay_mult: 1.0 #coefficient of regularization for bias
                          #default is 1.0, here is for the sake of clarity 
        ...  #left 

      The answer for this question is: when caffe obtains the gradient, the solver will consider the biased value in the regularization only if the 2 variables: the second decay_mult above and the weight_decay in the solver.prototxt are both larger than zero.

      Details can be found in functoin void SGDSolver::Regularize().

    Hope this will help you.