How to aply the same PCA to train and test set

I'm applying a PCA to my train set and want to do a classification with SVM for example. How can I have the same features in the test set automatically? (same than the new train set after PCA).

Solution

In python with scikit-learn, we fit PCA and the classifier on the training data set, and then we transform the test data set using the already fitted pca and classifier. This is an example:

from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# load data
iris = load_iris()

# initiate PCA and classifier
pca = PCA()
classifier = DecisionTreeClassifier()

# transform / fit

X_transformed = pca.fit_transform(iris.data)
classifier.fit(X_transformed, iris.target)

# predict "new" data
# (I'm faking it here by using the original data)

newdata = iris.data

# transform new data using already fitted pca
# (don't re-fit the pca)
newdata_transformed = pca.transform(newdata)

# predict labels using the trained classifier

pred_labels = classifier.predict(newdata_transformed)

You should apply the same logic with weka: apply the fitted pca filter on the test data and then perform predictions on the pca-transformed test set. You can check the following weka related topic: Principal Component Analysis on Weka