for example we have:
from sklearn.decomposition import PCA
import numpy as np
xx = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA()
pca.fit_transform(xx)
otput:
array([[ 1.38340578, 0.2935787 ],
[ 2.22189802, -0.25133484],
[ 3.6053038 , 0.04224385],
[-1.38340578, -0.2935787 ],
[-2.22189802, 0.25133484],
[-3.6053038 , -0.04224385]])
In this case i am not reducing the size however the array is changed... why?
PCA does a linear (rotation) transformation of your feature space. In your case,
assume feature 1 is along x
and feature 2 is along y
, the resulting transformation is the same as a rotating your feature vectors through an angle of theta
~ 2.565 radians. Below I've defined such a rotation matrix and show you get the same result:
import numpy as np
def rot_matrix(theta):
# returns rotation matrix through angle theta
rotation_matrix = np.dot(np.array([[np.cos(theta), -
np.sin(theta)], [np.sin(theta), np.cos(theta)]])
return rotation_matrix
theta = 2.565
rot = rot_matrix(theta)
np.dot(rot, xx.T).T
result is (close to) the output of the PCA transform:
array([[ 1.38349574, 0.29315446],
[ 2.22182084, -0.25201619],
[ 3.60531658, 0.04113827],
[-1.38349574, -0.29315446],
[-2.22182084, 0.25201619],
[-3.60531658, -0.04113827]])