I have a timeseries of first differences onto which i apply PCA using scikit to get the first PC
# data is a timeseries of first differences
pca = PCA(n_components=1)
pca.fit(data)
pc1_trans = pca.transform(data)
pc1_dot = numpy.dot( data, pca.components_.T)
plt.plot( numpy.cumsum( pc1_dot ) )
plt.plot( numpy.cumsum( pc1_trans ) )
i thought the result of the dot product (projection) between the original data and the first components would yield the same result as calling pca.transform but this is not the case (results below; orange line is the data from transform). Why is this?
Found the answer here
scikit PCA shows you the transform on the de-meaned data, so these are equivalent:
pc1_trans = pca.transform(data)
pc1_dot = numpy.dot( data - data.mean(), pca.components_.T)