Search code examples
machine-learningscikit-learnpca

What does the sklearn PCA to the input array when when the number of components is choose to be the same?


for example we have:

from sklearn.decomposition import PCA
import numpy as np 

xx = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA()
pca.fit_transform(xx)

otput:

array([[ 1.38340578,  0.2935787 ],
   [ 2.22189802, -0.25133484],
   [ 3.6053038 ,  0.04224385],
   [-1.38340578, -0.2935787 ],
   [-2.22189802,  0.25133484],
   [-3.6053038 , -0.04224385]])

In this case i am not reducing the size however the array is changed... why?


Solution

  • PCA does a linear (rotation) transformation of your feature space. In your case, assume feature 1 is along x and feature 2 is along y, the resulting transformation is the same as a rotating your feature vectors through an angle of theta ~ 2.565 radians. Below I've defined such a rotation matrix and show you get the same result:

    import numpy as np
    def rot_matrix(theta):
        # returns rotation matrix through angle theta
        rotation_matrix = np.dot(np.array([[np.cos(theta), -
    
    np.sin(theta)], [np.sin(theta), np.cos(theta)]])
            return rotation_matrix
    
    theta = 2.565
    rot = rot_matrix(theta)
    np.dot(rot, xx.T).T
    

    result is (close to) the output of the PCA transform:

    array([[ 1.38349574,  0.29315446],
           [ 2.22182084, -0.25201619],
           [ 3.60531658,  0.04113827],
           [-1.38349574, -0.29315446],
           [-2.22182084,  0.25201619],
           [-3.60531658, -0.04113827]])