Search code examples
pythonnumpypcasvd

How change order of SVD using numpy python


I am using Singular Value Decomposition (SVD) for Principal Component Analysis (PCA) of images.

I have 17 images of 20 X 20 so I created images matrix

M =  dim(400 X 17)

and when I apply SVD ( M = u @ d @ v) it gives me

u = dim(400 X 17)
d = dim(17 X 17)   
v = dim(17 X 17)

but I want to find u = dim(400 X 400) and d =(400 X 400) and v =(400 X 17) because there will 400 eigen vectors and 400 eigen values.

I even tried doing transpose but no luck

I know maybe question's title not much clear so please change as you like, and here are some info related to data

  1. I centralized data by subtracting mean face

  2. I tried to work around by finding eigenvectors of covariance matrix(MM') but when I tried to show PCA1 it only shows a black image

Please help me out


Solution

  • Eigenvalues aren't defined for rectangular matrices, but the singular values are related. As for eigenvectors, you always have a set of right and left eigenvectors that span column and row space.

    SVD is related to the eigenvalue decomposition of MM' and M'M

    • M'M = V (S'S) V'
    • MM' = U (SS') U'

    now

    • The columns of V are eigenvectors of M'M, which is of size (17 x 17) in your case. Therefore V is (17 x 17)
    • The columns of U are eigenvectors of MM', which is of size (400 x 400) in your case. Therefore U is (400 x 400)

    Now what's the size of S? The non-zero elements of S (singular values) are the square roots of the non-zero eigenvalues of M'M and MM'. It can bew shown these two have the same sets of non-zero eigenvalues, therefore in the first case S is (17 x 17) and in the second case (400 x 400). How do we reconcile this with the fact that our SVD is M = USV'? We build a rectangular diagonal matrix (400 x 17) with the square roots of the 17 non-zero eigenvalues.

    You can use SVD from scipy:

    import scipy
    
    u, s, vh = scipy.linalg.svd(M, full_matrices=True)
    print(u.shape, s.shape, vh.shape)
    

    that gives

    ((400, 400), (17,), (17, 17))
    

    To get your S to (400 x 17):

    s = np.concatenate([np.diag(s), np.zeros((400-17, 17))], axis=0)
    

    Check SVD correctness:

    res = u@s@vh
    np.allclose(res, a)
    
    True
    

    Low rank matrix approximation

    Sometimes you want to approximate your matrix M with a low-rank M_tilde of rank r, in this case, if you want to minimize the Frobenius norm between the two, you just keep the r largest singular values (Eckhart-Young theorem). The sizes of U, S, V become: (400 x r), (r x r), (r x 17), where S is diagonal.

    I don't know which function you're using, but this is what's happening: the zero singular values are being thrown away, as an (m x n) matrix can have at most rank min(m, n) (in your case 17).