I am working to understand how to build my own ANN from scratch.
I have looked around and found a simple two layer architecture and the init_parameters
function is like this.
def init_parameters():
W1 = np.random.normal(size=(10, 784)) * np.sqrt(1./(784))
b1 = np.random.normal(size=(10, 1)) * np.sqrt(1./10)
W2 = np.random.normal(size=(10, 10)) * np.sqrt(1./20)
b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
return W1, b1, W2, b2
This is following the MINST data set. I understand that the input layer is 784
for each
28x28
input image. For the hidden layer it is 10
if I am understanding the provided code. Lastly the output layer is 10
for each number from 0
to 9
.
My goal is to increase the number of neurons in the hidden layer.
This is what I have tried:
def init_parameters():
W1 = np.random.normal(size=(20, 784)) * np.sqrt(1./(784))
b1 = np.random.normal(size=(10, 1)) * np.sqrt(1./10)
W2 = np.random.normal(size=(10, 20)) * np.sqrt(1./20)
b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
return W1, b1, W2, b2
This gives me an error when using my `forward_prop´ function
def forward_prop(W1, b1, W2, b2, X):
Z1 = W1.dot(X) + b1
A1 = ReLU(Z1)
Z2 = W2.dot(A1) + b2
A2 = softmax(Z2)
return Z1, A1, Z2, A2
Error:
-> Z1 = W1.dot(X) + b1
ValueError: operands could not be broadcast together with shapes (20,29400) (10,1)
From the error it seems that the b1
is not matched so I thought changing b1
to b1 = np.random.normal(size=(20, 1)) * np.sqrt(1./10)
will work and it did.
The new init_parameters
is this:
def init_parameters():
W1 = np.random.normal(size=(20, 784)) * np.sqrt(1./(784))
b1 = np.random.normal(size=(20, 1)) * np.sqrt(1./10)
W2 = np.random.normal(size=(10, 20)) * np.sqrt(1./20)
b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
return W1, b1, W2, b2
Did I actually increase the neuron in the hidden layer and is this the right approach to what I am trying to achieve?
Yes, you added 10 more hidden units (neurons) and there is nothing wrong with your approach. W1
and b1
contain the weights and biases of your first layer, and each row of those matrices represent a hidden unit.
From Andrew Ng's Deep Learning course (which uses notation similar to yours):
Notice the correspondence in number of hidden units and rows in the matrices.