Numpy array broadcasting vs explicit dimension expansion space inefficiency

Why is the explicit dimension expansion so space inefficient as compared to leveraging implicit numpy broadcasting, which technically does the same thing i.e. copies the matrix over on a given dimension.

I have two arrays: X(500,3072) and X_train(5000,3072). I want to calculate distance of all 500 points in X from the 5000 points in X_train. When I try to do this via explicit dimension expansion, it takes over 60GB of space to do this calculation.

dists = np.linalg.norm(np.expand_dims(X,axis=1)-(X_train), axis =2)

Whereas if I leverage numpy's broadcasting, it gets done within MBs of space.

dists = np.square(X).sum(axis=1, keepdims=True)+np.square(X_train).sum(axis=1, keepdims=True).T-2*np.dot(X, X_train.T)

Why is the explicit dimension expansion so space inefficient despite of such small matrices being used.

Solution

Tracking dimensions in the 2 alternatives:

X(500,3072) and X_train(5000,3072).  
(500,1,3072) - (1,5000, 30720) => (500,5000,30720)

norm squares this, and sums on axis 2 (30720) and returns the sqrt. (500,5000)

That 3d array is quite large, 600GB

np.square(X).sum(axis=1, keepdims=True)+np.square(X_train).sum(axis=1, keepdims=True).T-2*np.dot(X, X_train.T)

(500,3072)=>(500,1)
(5000,3872)=>(5000,1)==>(1,5000)
==> (500,5000)
dot (500,3072), (3872,5000) ==> (500,5000)
net (500,5000)