I have been trying to create a small neural network to learn softmax function with an article from the following website: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html
It works well for a single iteration. But, when I create a loop for training the network with updated weights, I get the following error: ValueError: operands could not be broadcast together with shapes (5,10) (1,5) (5,10). I have attached a screenshot of the output here.
Debugging this issue, I found out that np.max() returns array of shape (5,1) and (1,5) at different iterations even though the axis is being set to 1. Please help me in identifying what went wrong in the following code.
import numpy as np
N = 5
D = 10
C = 10
W = np.random.rand(D,C)
X = np.random.randint(255, size = (N,D))
X = X/255
y = np.random.randint(C, size = (N))
#print (y)
lr = 0.1
for i in range(100):
print (i)
loss = 0.0
dW = np.zeros_like(W)
N = X.shape[0]
C = W.shape[1]
f = X.dot(W)
#print (f)
print (np.matrix(np.max(f, axis=1)))
print (np.matrix(np.max(f, axis=1)).T)
f -= np.matrix(np.max(f, axis=1)).T
#print (f)
term1 = -f[np.arange(N), y]
sum_j = np.sum(np.exp(f), axis=1)
term2 = np.log(sum_j)
loss = term1 + term2
loss /= N
loss += 0.5 * reg * np.sum(W * W)
#print (loss)
coef = np.exp(f) / np.matrix(sum_j).T
coef[np.arange(N),y] -= 1
dW = X.T.dot(coef)
dW /= N
dW += reg*W
W = W - lr*dW
In your first iteration, W
is an instance of np.ndarray
with shape (D, C)
. f
inherits ndarray
, so when you do np.max(f, axis = 1)
, it returns a an ndarray
of shape (D,)
, which np.matrix()
turns into shape (1, D)
which is then transposed to (D, 1)
But on your following iterations, W
is an instance of np.matrix
(which it inherits from dW
in W = W - lr*dW
). f
then inherits np.matrix
, and np.max(f, axis = 1)
returns a np.matrix
of shape (D, 1)
, which passes through np.matrix()
unphased and turns into shape (1, D)
after .T
To fix this, make sure you don't mix np.ndarray
with np.matrix
. Either define everything as np.matrix
from the start (i.e. W = np.matrix(np.random.rand(D,C))
) or use keepdims
to maintain your axes like:
f -= np.max(f, axis = 1, keepdims = True)
which will let you keep everything 2D without needing to cast to np.matrix
.(also do this for sum_j
)