When searching for ways to implement L1 regularization in PyTorch Models, I came across this question, which is now 2 years old so i was wondering if theres anything new on this topic?
I also found this recent approach of dealing with the missing l1 function. However I don't understand how to use it for a basic NN as shown below.
class FFNNModel(nn.Module):
def __init__(self, input_dim, output_dim, hidden_dim, dropout_rate):
super(FFNNModel, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
self.hidden_dim = hidden_dim
self.dropout_rate = dropout_rate
self.drop_layer = nn.Dropout(p=self.dropout_rate)
self.fully = nn.ModuleList()
current_dim = input_dim
for h_dim in hidden_dim:
self.fully.append(nn.Linear(current_dim, h_dim))
current_dim = h_dim
self.fully.append(nn.Linear(current_dim, output_dim))
def forward(self, x):
for layer in self.fully[:-1]:
x = self.drop_layer(F.relu(layer(x)))
x = F.softmax(self.fully[-1](x), dim=0)
return x
I was hoping simply putting this before training would work:
model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = _Regularizer(model)
regularizer = L1Regularizer(regularizer, lambda_reg=0.1)
with
out = model(inputs)
loss = criterion(out, target) + regularizer.__add_l1()
Does anyone understand how to apply these 'ready to use' classes?
I haven't run the code in question, so please reach back if something doesn't exactly work. Generally, I would say that the code you linked is needlessly complicated (it may be because it tries to be generic and allow all the following kinds of regularization). The way it is to be used is, I suppose
model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = L1Regularizer(model, lambda_reg=0.1)
and then
out = model(inputs)
loss = criterion(out, target) + regularizer.regularized_all_param(0.)
You can check that regularized_all_param
will just iterate over parameters of your model and if their name ends with weight
, it will accumulate their sum of absolute values. For some reason the buffer is to be manually initialized, that's why we pass in the 0.
.
Really though, if you wish to efficiently regularize L1 and don't need any bells and whistles, the more manual approach, akin to your first link, will be more readable. It would go like this
l1_regularization = 0.
for param in model.parameters():
l1_regularization += param.abs().sum()
loss = criterion(out, target) + l1_regularization
This is really what is at heart of both approaches. You use the Module.parameters
method to iterate over all model parameters and you sum up their L1 norms, which then becomes a term in your loss function. That's it. The repo you linked comes up with some fancy machinery to abstract it away but, judging by your question, fails :)