I've been learning neural networks from Michael Nielsen's http://neuralnetworksanddeeplearning.com/chap1.html.
In the section below to update the weights and biases
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
#Zero vectors
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
####
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch {0} complete".format(j)
what is the need to introduce nabla_b and nabla_w zero vectors? when they're simply being added to the dnb and dnw which are numpy arrays themselves. Isn't 0 + something = something. What is the need for zero vector here for a single training example?
As a test I removed the zero vector and had dnb and dnw by itself and I failed to see any significant difference in the training.
Thank you.
Yes, you are right 0 + something = something
, but in the second iteration, it will be
something +something_else = value
So, this happens in the following code
for x, y in mini_batch:
Here, for the first minibatch
nabla_w
,nabla_b
will be 0, but for the second and later iterations, it will have some value.
lets consider the following code
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
in the first iteration both nabla_b
and nabla_w
are zero's.
but, in this iteration, these are updated because of nb+dnb
and so, nabla_b and nabla_w are no longer just vectors with just zeros. so, in the second iteration, nabla_b is no longer a zero vector