I'm applying a PCA to my train set and want to do a classification with SVM for example. How can I have the same features in the test set automatically? (same than the new train set after PCA).
In python with scikit-learn, we fit PCA and the classifier on the training data set, and then we transform the test data set using the already fitted pca and classifier. This is an example:
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
# load data
iris = load_iris()
# initiate PCA and classifier
pca = PCA()
classifier = DecisionTreeClassifier()
# transform / fit
X_transformed = pca.fit_transform(iris.data)
classifier.fit(X_transformed, iris.target)
# predict "new" data
# (I'm faking it here by using the original data)
newdata = iris.data
# transform new data using already fitted pca
# (don't re-fit the pca)
newdata_transformed = pca.transform(newdata)
# predict labels using the trained classifier
pred_labels = classifier.predict(newdata_transformed)
You should apply the same logic with weka: apply the fitted pca filter on the test data and then perform predictions on the pca-transformed test set. You can check the following weka related topic: Principal Component Analysis on Weka