Search code examples
deep-learningcaffelasagneregularized

L2 regularization in caffe, conversion from lasagne


I have a lasagane code. I want to create the same network using caffe. I could convert the network. But i need help with the hyperparameters in lasagne. The hyperparameters in lasagne look like:

lr = 1e-2
weight_decay = 1e-5

prediction = lasagne.layers.get_output(net['out'])
loss = T.mean(lasagne.objectives.squared_error(prediction, target_var))

weightsl2 = lasagne.regularization.regularize_network_params(net['out'], lasagne.regularization.l2)
loss += weight_decay * weightsl2

How do i perform the L2 regularization part in caffe? Do I have to add any layer for regularization after each convolution/inner-product layer? Relevant parts from my solver.prototxt is as below:

base_lr: 0.01
lr_policy: "fixed"
weight_decay: 0.00001
regularization_type: "L2"
stepsize: 300
gamma: 0.1  
max_iter: 2000
momentum: 0.9

also posted in http://datascience.stackexchange.com. Waiting for answers.


Solution

  • It seems like you already got it right.
    The weight_decay meta-parameter combined with regularization_type: "L2" in your 'solver.prototxt' tell caffe to use L2 regularization with weight_decay = 1e-5.

    One more thing you might want to tweak is how much regularization affect each parameter. You can set this for each parameter blob in the net via

    param { decay_mult: 1 }
    

    For example, an "InnerProduct" layer with bias has two parameters:

    layer {
      type: "InnerProduct"
      name: "fc1"
      # bottom and top here
      inner_product_param { 
        bias_term: true
        # ... other params
      }
      param { decay_mult: 1 } # for weights use regularization
      param { decay_mult: 0 } # do not regularize the bias
    }
    

    By default, decay_mult is set to 1, that is, all weights of the net are regularized the same. You can change that to regularize more/less specific parameter blobs.