Search code examples

How to use adapted learning rate in MxNet

Learning rate is the key to the effect of my network. When I define lr = 0.05, the train/validation-accuracy oscillate severely, however lr = 0.025 I cann't get any effect until Epoch[30]. So I remember the adapted learning rate in caffe, at first I choose a base lr = 0.1, as training going on, lr decays to 0.05, then 0.025 and smaller. Does MxNet have this strategy, How can I use it?


  • You have a couple of options to do that:

    one is to use the callback function at the end of each batch/epoch:

    sgd_opt = opt.SGD(learning_rate=0.005, momentum=0.9, wd=0.0001, rescale_grad=(1.0/batch_size))
    model = mx.model.FeedForward(ctx=gpus, symbol=softmax, num_epoch=num_epoch,
                  optimizer=sgd_opt, initializer=mx.init.Uniform(0.07))
    def lr_callback(param):
        if param.nbatch % 10 == 0:
 /= 10 # decrease learning rate by a factor of 10 every 10 batches
        print 'nbatch:%d, learning rate:%f' % (param.nbatch,, eval_data=test_dataiter, batch_end_callback=lr_callback)

    The other is to use one of the optimizers such as AdaGrad or ADAM

    model = mx.model.FeedForward(
            ctx                = [mx.gpu(0)],
            num_epoch     = 60,
            symbol            = network,
            optimizer        =  'adam',
            initializer        = mx.init.Xavier(factor_type="in", magnitude=2.34)) data_train)