Search code examples
pythonarraysnumpymatrixmultiplication

Python Numpy Matrix Multiplication / Shape Not Compatible / Defying Math?


Consider the following code:

X = np.array([[1,1,1],
            [2,2,2],
            [3,3,3],
            [4,4,4]]) # shape of (4,3)

print("Matrix X is:\n", X)


print("\n --------- TEST 1 ---------\n")

W = np.array([1,2,3]) # 1D array (row)
print("W is:", W)
print("X * W is:\n", X*W)

print("\n --------- TEST 2 ---------\n")

W = np.array([[1,2,3]]) # 2D array (shape 1,3)
print("W is:", W)
print("X * W is:\n", X*W)


print("\n --------- TEST 3 ---------\n")

W = np.array([[1,2,3]]).T # 2D array (shape 3,1)
print("W is:\n", W)
print("X * W is:\n", X*W)


print("\n --------- TEST 4 ---------\n")

W = np.array([1,2,3]) # 1D array (row)
print("W is:\n", W)
print("X @ W is:\n", X@W.T) # dot product


print("\n --------- TEST 5 ---------\n")

W = np.array([[1,2,3]]) # 2D array (shape 1,3)
print("W is:\n", W)
print("X @ W is:\n", X@W.T) # dot product


print("\n --------- TEST 6 ---------\n")

W = np.array([[1,2,3]]).T # 2D array (shape 3,1)
print("W is:\n", W)
print("X @ W is:\n", X@W) # dot product

The output (omitting Test 3) is:

Matrix X is:
 [[1 1 1]
 [2 2 2]
 [3 3 3]
 [4 4 4]]

 --------- TEST 1 ---------

W is: [1 2 3]
X * W is:
 [[ 1  2  3]
 [ 2  4  6]
 [ 3  6  9]
 [ 4  8 12]]

 --------- TEST 2 ---------

W is: [[1 2 3]]
X * W is:
 [[ 1  2  3]
 [ 2  4  6]
 [ 3  6  9]
 [ 4  8 12]]

 --------- TEST 4 ---------

W is:
 [1 2 3]
X @ W is:
 [ 6 12 18 24]

 --------- TEST 5 ---------

W is:
 [[1 2 3]]
X @ W is:
 [[ 6]
 [12]
 [18]
 [24]]

 --------- TEST 6 ---------

W is:
 [[1]
 [2]
 [3]]
X @ W is:
 [[ 6]
 [12]
 [18]
 [24]]

The Problem:

Test 3 will fail with the following error message:

ValueError: operands could not be broadcast together with shapes (4,3) (3,1)

However mathematically that is incorrect and should work just fine. An N-col matrix multiplied with an N-row matrix or vector is a valid mathematical operation. And in this case there are 3 columns in X and 3 rows in W. So what is going on?

I would expect this kind of error from Test 1 & Test 2 but can maybe understand how numpy will "reshape" W so they match up. But an error in Test3? Really?

I assume this has something to do with broadcasting in Numpy? Seems mathematically somewhat counter intuitive. Does Numpy need to have the columns of the Multiplier (2nd factor / from now on "F2") always in the shape matching the columns of the Multiplicand (1st factor / from now on "F1")? So basically it goes through each row in F1 and multiplies each column of that row with the column of F2?

Just to revisit this mathematically correct is: Cols F1 = Rows F2. And here is seems that: Cols F1 = Cols F2.

And More:

Extending this to the dot product after Test 3, technically Test 6 should not work and yet suddenly it does (and mathematically correctly so!).

Other than that Test 6 is mathematically solid for me so no questions here.

Am I right with the following assumptions:

Test 4:

If F2 is a 1D row vector the dot product will still work exactly the same (as if it was a column vector) but it will output a 1D row vector. That is basically the only real difference with a multiplication like that? Under the hood the exact same work happens essentially as if it was a 2D vector?

Test 5:

So mathematically I am assuming Numpy is "auto-transforming" F2 (Row) internally into a column really so Test 6 and Test 5 are really the same.

PS: Writing this question I think I've realized that Numpy matches position again position. Almost like a map grid system of sorts. However that Test 3 isn't allowed still baffles me to be honest. It makes sense if you are looking that the "grids" don't align with each other but surely this could have been implemented? If Test 6 works I don't see why Test 3 shouldn't.

Thanks!


Solution

  • I think you are confusing the * operator with the @ operator.

    • The former is elementwise multiplication, and shapes of the two operands must match or be able to be "broadcasted".
    • The latter is matrix multiplication, therefore you can only use it if X.shape[1] coincide with W.shape[0].

    The error you are getting in Test 3 is because numpy brodcasting follows specific rules, that can be found here, and are not applicable in Test 3, on the other hand on Test 6 you are perfectly able to do matrix multiplication as expected.