Search code examples
pythonnumpysoftmax

Could not determine shape of numpy array in a loop containing transpose operation


I have been trying to create a small neural network to learn softmax function with an article from the following website: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html

It works well for a single iteration. But, when I create a loop for training the network with updated weights, I get the following error: ValueError: operands could not be broadcast together with shapes (5,10) (1,5) (5,10). I have attached a screenshot of the output here.enter image description here

Debugging this issue, I found out that np.max() returns array of shape (5,1) and (1,5) at different iterations even though the axis is being set to 1. Please help me in identifying what went wrong in the following code.

import numpy as np

N = 5
D = 10
C = 10

W = np.random.rand(D,C)
X = np.random.randint(255, size = (N,D))
X = X/255
y = np.random.randint(C, size = (N))
#print (y)
lr = 0.1

for i in range(100):
  print (i)
  loss = 0.0
  dW = np.zeros_like(W)
  N = X.shape[0]
  C = W.shape[1]

  f = X.dot(W)
  #print (f)

  print (np.matrix(np.max(f, axis=1)))
  print (np.matrix(np.max(f, axis=1)).T)
  f -= np.matrix(np.max(f, axis=1)).T
  #print (f)  

  term1 = -f[np.arange(N), y]
  sum_j = np.sum(np.exp(f), axis=1)
  term2 = np.log(sum_j)
  loss = term1 + term2
  loss /= N 
  loss += 0.5 * reg * np.sum(W * W)
  #print (loss)

  coef = np.exp(f) / np.matrix(sum_j).T
  coef[np.arange(N),y] -= 1
  dW = X.T.dot(coef)
  dW /= N
  dW += reg*W

  W = W - lr*dW

Solution

  • In your first iteration, W is an instance of np.ndarray with shape (D, C). f inherits ndarray, so when you do np.max(f, axis = 1), it returns a an ndarray of shape (D,), which np.matrix() turns into shape (1, D) which is then transposed to (D, 1)

    But on your following iterations, W is an instance of np.matrix (which it inherits from dW in W = W - lr*dW). f then inherits np.matrix, and np.max(f, axis = 1) returns a np.matrix of shape (D, 1), which passes through np.matrix() unphased and turns into shape (1, D) after .T

    To fix this, make sure you don't mix np.ndarray with np.matrix. Either define everything as np.matrix from the start (i.e. W = np.matrix(np.random.rand(D,C))) or use keepdims to maintain your axes like:

    f -= np.max(f, axis = 1, keepdims = True)
    

    which will let you keep everything 2D without needing to cast to np.matrix.(also do this for sum_j)