Performing svd by sklearn.decomposition.PCA , how can I get the U S V from this?

I perform SVD with sklearn.decomposition.PCA

From the equation of the SVD

A= U x S x V_t

V_t = transpose matrix of V (Sorry I can't paste the original equation)

If I want the matrix U, S, and V, how can I get it if I use the sklearn.decomposition.PCA ?

Solution

First of all, depending on the size of your matrix, sklearn implementation of PCA will not always compute the full SVD decomposition. The following is taken from PCA's GitHub reciprocity:

svd_solver : string {'auto', 'full', 'arpack', 'randomized'}
        auto :
            the solver is selected by a default policy based on `X.shape` and
            `n_components`: if the input data is larger than 500x500 and the
            number of components to extract is lower than 80% of the smallest
            dimension of the data, then the more efficient 'randomized'
            method is enabled. Otherwise the exact full SVD is computed and
            optionally truncated afterwards.
        full :
            run exact full SVD calling the standard LAPACK solver via
            `scipy.linalg.svd` and select the components by postprocessing
        arpack :
            run SVD truncated to n_components calling ARPACK solver via
            `scipy.sparse.linalg.svds`. It requires strictly
            0 < n_components < X.shape[1]
        randomized :
            run randomized SVD by the method of Halko et al.

In addition, it also performs some manipulations on the data (see here).

Now, if you want to get U, S, V that are used in sklearn.decomposition.PCA you can use pca._fit(X). For example:

from sklearn.decomposition import PCA
X = np.array([[1, 2], [3,5], [8,10], [-1, 1], [5,6]])
pca = PCA(n_components=2)
pca._fit(X)

prints

(array([[ -3.55731195e-01,   5.05615563e-01],
        [  2.88830295e-04,  -3.68261259e-01],
        [  7.10884729e-01,  -2.74708608e-01],
        [ -5.68187889e-01,  -4.43103380e-01],
        [  2.12745524e-01,   5.80457684e-01]]),
 array([ 9.950385  ,  0.76800941]),
 array([[ 0.69988535,  0.71425521],
        [ 0.71425521, -0.69988535]]))

However, if you just want the SVD decomposition of the original data, I would suggest to use scipy.linalg.svd