Why is the result of PCA and IncreasePCA different largely?
I use the PCA and IncreasePCA to fit the same data.
But when the transform, the gap between the two method is large.
can you help me explain it? Thank you very much!
import numpy as np
from sklearn.decomposition import PCA, IncrementalPCA
data = np.random.random([100000, 512])
pca_obj = PCA(n_components=256)
ipca_obj = IncrementalPCA(n_components=256, batch_size=1000)
pca_obj.fit(data)
ipca_obj.fit(data)
print pca_obj.transform(np.expand_dims(data[0], axis=0))
print ipca_obj.transform(np.expand_dims(data[0], axis=0))
From the docs,
IPCA builds a low-rank approximation for the input data using an amount of memory which is independent of the number of input data samples.
IPCA should only be used on massive data sets as it in effect downsamples your data. The larger the data set the closer the IPCA projection will look like PCA but it will always be an approximation & this will be more obvious with small data sets.