Search code examples
pythonneural-networkpytorchloss-function

How to add L1 Regularization to PyTorch NN Model?


When searching for ways to implement L1 regularization in PyTorch Models, I came across this question, which is now 2 years old so i was wondering if theres anything new on this topic?

I also found this recent approach of dealing with the missing l1 function. However I don't understand how to use it for a basic NN as shown below.

class FFNNModel(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim, dropout_rate):
        super(FFNNModel, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dim = hidden_dim
        self.dropout_rate = dropout_rate
        self.drop_layer = nn.Dropout(p=self.dropout_rate)
        self.fully = nn.ModuleList()
        current_dim = input_dim
        for h_dim in hidden_dim:
            self.fully.append(nn.Linear(current_dim, h_dim))
            current_dim = h_dim
        self.fully.append(nn.Linear(current_dim, output_dim))

    def forward(self, x):
        for layer in self.fully[:-1]:
            x = self.drop_layer(F.relu(layer(x)))
        x = F.softmax(self.fully[-1](x), dim=0)
        return x

I was hoping simply putting this before training would work:

model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = _Regularizer(model)
regularizer = L1Regularizer(regularizer, lambda_reg=0.1)

with

out = model(inputs)
loss = criterion(out, target) + regularizer.__add_l1()

Does anyone understand how to apply these 'ready to use' classes?


Solution

  • I haven't run the code in question, so please reach back if something doesn't exactly work. Generally, I would say that the code you linked is needlessly complicated (it may be because it tries to be generic and allow all the following kinds of regularization). The way it is to be used is, I suppose

    model = FFNNModel(30,5,[100,200,300,100],0.2)
    regularizer = L1Regularizer(model, lambda_reg=0.1)
    

    and then

    out = model(inputs)
    loss = criterion(out, target) + regularizer.regularized_all_param(0.)
    

    You can check that regularized_all_param will just iterate over parameters of your model and if their name ends with weight, it will accumulate their sum of absolute values. For some reason the buffer is to be manually initialized, that's why we pass in the 0..

    Really though, if you wish to efficiently regularize L1 and don't need any bells and whistles, the more manual approach, akin to your first link, will be more readable. It would go like this

    l1_regularization = 0.
    for param in model.parameters():
        l1_regularization += param.abs().sum()
    loss = criterion(out, target) + l1_regularization
    

    This is really what is at heart of both approaches. You use the Module.parameters method to iterate over all model parameters and you sum up their L1 norms, which then becomes a term in your loss function. That's it. The repo you linked comes up with some fancy machinery to abstract it away but, judging by your question, fails :)