I'm following an online tutorial on neural networks, neuralnetworksanddeeplearning.com The writer, Nielsen, implemented L2-regularization in the code as a part of this tutorial. Now he asks us to modify the code in such a way that it uses L1-regularization instead of L2. This link will take you straight to the part of the tutorial I am talking about.
The weight update rule with L2-regularization using Stochastic gradient descent is as follows:
And Nielsen implements it in python as such:
self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
The update rule with L1-regularization becomes:
And I tried to implement it as follows:
self.weights = [(w - eta* (lmbda/len(mini_batch)) * np.sign(w) - (eta/len(mini_batch)) * nw)
for w, nw in zip(self.weights, nabla_w)]
Suddenly my neural network has a classification accuracy of +- chance... How can this be? Did i make a mistake in my implementation of L1-regularization? I have a neural network with 30 hidden neurons, learning rate of 0.5 and lambda = 5.0. When I use the L2 regularization everything is fine.
For your convenience please find the entire update function here:
def update_mini_batch(self, mini_batch, eta, lmbda, n):
"""Update the network's weights and biases by applying gradient
descent using backpropagation to a single mini batch. The
``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
learning rate, ``lmbda`` is the regularization parameter, and
``n`` is the total size of the training data set.
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
You are doing the math wrong. The translation in code of the formula you want to implement is:
self.weights = [
(w - eta * (lmbda / n) * np.sign(w) - eta * nabla_b[0])
for w in self.weights]
The two required modifications are: