Search code examples
rneural-networkdeep-learningmxnet

How to specify regularization parameter (L1 or L2) for a feed forward neural network in R using the mxnet package?


I am using R mxnet package. Here is the code block that I am currently using. But I am not sure how to specify regularization.

dpLnModel <- mx.model.FeedForward.create(symbol             = out,
                                         X                  = trainX,
                                         y                  = trainY,
                                         ctx                = mx.cpu(),
                                         num.round          = numIter,
                                         eval.metric        = mx.metric.rmse,
                                         array.batch.size   = 50,
                                         array.layout       = "rowmajor",
                                         verbose            = TRUE,
                                         optimizer          = "rmsprop",
                                         eval.data          = list(data  = testX,
                                                                   label = testY
                                         ),
                                         initializer        = mx.init.normal(initValVar),
                                         epoch.end.callback = mx.callback.log.train.metric(5, logger)
)

Solution

  • As @leezu's answer says, you need to set weight decay to get L2 regularisation. In the R API, the argument you need is wd e.g.

    dpLnModel <- mx.model.FeedForward.create(symbol             = out,
                                             X                  = trainX,
                                             y                  = trainY,
                                             ctx                = mx.cpu(),
                                             num.round          = numIter,
                                             eval.metric        = mx.metric.rmse,
                                             array.batch.size   = 50,
                                             array.layout       = "rowmajor",
                                             verbose            = TRUE,
                                             optimizer          = "rmsprop",
                                             wd                 = 0.00001)
    

    I think you can include any arguments from mx.opt.rmsprop. Note that the documentation there says that the default value of wd is zero i.e. no regularisation.