Search code examples
pythonnumpysvd

Latent Space and SVD


I have a fairly large dataset for which I am calculating the SVD on then I want to get back the matrix. My matrix shape is: (33388, 104) which is a lot of columns and rows. I want an energy of 80%, which is k=51.

When I go to return my matrix in the return, I get the following error:

operands could not be broadcast together with shapes (33388,51) (51,51)

img is a numpy array of an image file k is the number of singular vectors to use

How can I correct my function to fix this error?

def rank_k_approx(img, k):
    """Return a rank-k approximation

    img: an image (as a 2D grayscale array)
    k: number of singular vectors used"""
    u, sigma, vt = np.linalg.svd(img)
    energy = np.linalg.norm(sigma)**2
    approx_energy = np.linalg.norm(sigma[:k])**2
    percentage = approx_energy*100/energy
    print ("Energy retained = %4.2f"%percentage)
    return u[:,:k]*np.diag(sigma[:k])*vt[:k,:]

Solution

  • The outputs of np.linalg.svd are ndarray objects, for which the operator * is element-wise multiplication. You want to compute the matrix product, for which you need to use np.dot() or the @ operator.

    The multiplication should be:

    u[:, :k].dot(np.diag(sigma[:k])).dot(vt[:k])