I have this code:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5
weights_input_hidden = np.array([[0.5, -0.6],
[0.1, -0.2],
[0.1, 0.7]])
weights_hidden_output = np.array([0.1, -0.3])
## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)
## Backwards pass
## TODO: Calculate error
error = target - output
# TODO: Calculate error gradient for output layer
del_err_output = error * output * (1 - output)
print("del_err_output", del_err_output)
# TODO: Calculate error gradient for hidden layer
del_err_hidden = np.dot(del_err_output, weights_hidden_output) * hidden_layer_output * (1 - hidden_layer_output)
print("del_err_hidden", del_err_hidden)
print("del_err_hidden.shape", del_err_hidden.shape)
print("x", x)
print("x.shape", x.shape)
print("x[:,None]")
print(x[:,None])
print("x[:,None].shape", x[:,None].shape)
print("del_err_hidden * x[:, None]")
print(del_err_hidden * x[:, None])
that generates this output:
del_err_output 0.0287306695435
del_err_hidden [ 0.00070802 -0.00204471]
del_err_hidden.shape (2,)
x [ 0.5 0.1 -0.2]
x.shape (3,)
x[:,None]
[[ 0.5]
[ 0.1]
[-0.2]]
x[:,None].shape (3, 1)
del_err_hidden * x[:, None]
[[ 3.54011093e-04 -1.02235701e-03]
[ 7.08022187e-05 -2.04471402e-04]
[ -1.41604437e-04 4.08942805e-04]]
My problem is with this operation: del_err_hidden * x[:, None]
Which kind of operation is *
?
And second, if del_err_hidden.shape
is (2,) and x[:,None].shape
is (3, 1), why I can multiply them?
Someone has told me that it is related to elementwise and broadcasting, but I don't understand those terms. Because to do a elementwise multiplication both matrices have to have the same size, and here they don't have it.
Okay, I'm quoting the broadcasting rules from the documentation:
Two dimensions are compatible when
1) they are equal, or
2) one of them is 1
You have two arrays of shape (2, )
and (3, 1)
.
arr1 (1D) shape : 2
arr2 (2D) shape : 3 x 1
# ^
# | (c.f. rule-2)
In [24]: err # shape (2,)
Out[24]: array([2, 4])
In [26]: x # shape (3, 1)
Out[26]:
array([[3],
[4],
[5]])
Since, one of the array dimensions is 1, rules are passed. These arrays are broadcastable and can be multiplied. Next part is stretching out the array where err
becomes, (well only conceptually).
In [27]: err # shape (3, 2)
Out[27]:
array([[2, 4],
[2, 4],
[2, 4]])