Search code examples
weka

How to aply the same PCA to train and test set


I'm applying a PCA to my train set and want to do a classification with SVM for example. How can I have the same features in the test set automatically? (same than the new train set after PCA).


Solution

  • In python with scikit-learn, we fit PCA and the classifier on the training data set, and then we transform the test data set using the already fitted pca and classifier. This is an example:

    from sklearn.decomposition import PCA
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.datasets import load_iris
    
    # load data
    iris = load_iris()
    
    # initiate PCA and classifier
    pca = PCA()
    classifier = DecisionTreeClassifier()
    
    # transform / fit
    
    X_transformed = pca.fit_transform(iris.data)
    classifier.fit(X_transformed, iris.target)
    
    # predict "new" data
    # (I'm faking it here by using the original data)
    
    newdata = iris.data
    
    # transform new data using already fitted pca
    # (don't re-fit the pca)
    newdata_transformed = pca.transform(newdata)
    
    # predict labels using the trained classifier
    
    pred_labels = classifier.predict(newdata_transformed)
    

    You should apply the same logic with weka: apply the fitted pca filter on the test data and then perform predictions on the pca-transformed test set. You can check the following weka related topic: Principal Component Analysis on Weka