Search code examples
pythonnumpybroadcasting

operands could not be broadcast together with shapes (100,3) (100,) , why?


This is my first question in stackoverlow, and My English is really poor, so I'm grateful to all those who read my poor English and help me^_^

My question is about broadcasting. enter image description here What I want to do is mutiply each row of X by the number in the same row of B……

X is a (100,3) array and XW is a column vector, (100,). Why They can't broadcast?

After I add "XW = XW.reshape((X.shape[0],1))", Then, they can broadcast. Why…… Are there any difference between (100,1) and (100,)?

I think my picture have clearly described my question...My code really long.... I think it's not convenient to watch my code...

Here is the code..

import numpy as np
import matplotlib.pyplot as plt

class MyFirstMachineLeaningAlgorithm():
    def StochasticGradientDescent(self, W, X, count=100, a=0.1):

        n = X.shape[0]
        for i in range(count):  # 学习count次
            gradient = np.zeros(3)
            for j in range(n):
                gradient += X[j, :] * (1 - 2 * (X[j, :] @ W))

            W = W + a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

        return W

    def BatchGraidentDescent(self, W, X, count=100, a=0.1):
        for i in range(count):
            XW = X @ W
            XW = 1 - 2 * XW

            #XW = XW.reshape((X.shape[0],1))
            gradient = X*XW
            gradient = np.sum(gradient,axis = 0)

            W = W + a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

    def train(self, count=100):
        self.W = self.BatchGraidentDescent(self.W, self.X, count)

    def draw(self):
        draw_x = np.arange(-120, 120, 0.01)
        draw_y = -self.W[0] / self.W[1] * draw_x
        draw_y = [-self.W[2] / self.W[1] + draw_y[i] for i in range(len(draw_y))]
        plt.plot(draw_x, draw_y)
        plt.show()

    def __init__(self):
        array_size = (50, 2)
        array1 = np.random.randint(50, 100, size=array_size)
        array2 = np.random.randint(-100, -50, size=array_size)
        array = np.vstack((array1, array2))
        column = np.ones(100)
        self.X = np.column_stack((array, column))
        plt.scatter(array[:, 0], array[0:, 1])
        self.W = np.array([1, 2, 3])
        self.W = self.W / np.sqrt((self.W @ self.W))

g = MyFirstMachineLeaningAlgorithm()
g.train()
g.draw()


Solution

  • It's best to post error information with copy-n-paste, not an image. Still the image is better than nothing.

    So the error occurs in the last line of this clip:

            XW = X @ W
            XW = 1 - 2 * XW
    
            #XW = XW.reshape((X.shape[0],1))
            gradient = X*XW
    

    Just from the function definition I can't tell the shape of X and W. Apparently X is 2d (100,n). If W is (n,), then XW will be (100,), with the sum-of-products on the n dimension. Read the np.matmul docs if that isn't clear.

    By the rules of broadcasting (look them up), if one array doesn't have as many dimensions as the other, it will add leading dimensions as needed. Thus (100,) can become (1,100). But to avoid ambiguity, it will not add a trailing dimension. You have to provide that yourself. So the last line should become

     gradient = X * XW[:,None]
    

    or the equivalent using XW.reshape(-1,1) or your version.

    Because arrays can be 1d (or even 0d), terms like row vector or column vector have limited value. A 1d array can thought of as a row vector in some cases - where this auto-leading dimension applies.


    In init,

        self.X = np.column_stack((array, column))
        self.W = np.array([1, 2, 3])
    

    X is (100,3) and W is (3,). X@W is then (100,).

    In [45]: X=np.ones((100,3)); W=np.array([1,2,3])
    In [46]: (X@W).shape
    Out[46]: (100,)
    In [47]: X * (1+(X@W)[:,None]);