Search code examples
machine-learningscikit-learnfeature-extractionfeature-selection

Getting the column names chosen after a feature selection method


Given a simple feature selection code below, I want to know the selected columns after the feature selection (The dataset includes a header V1 ... V20)

import pandas as pd
from sklearn.feature_selection import SelectFromModel, SelectKBest, f_regression


def feature_selection(data):
    y = data['Class']
    X = data.drop(['Class'], axis=1)
    fs = SelectKBest(score_func=f_regression, k=10)

    # Applying feature selection
    X_selected = fs.fit_transform(X, y)
    # TODO: determine the columns being selected

    return X_selected


data = pd.read_csv("../dataset.csv")
new_data = feature_selection(data)

I appreciate any help.


Solution

  • I have used the iris dataset for my example but you can probably easily modify your code to match your use case. The SelectKBest method has the scores_ attribute I used to sort the features.

    Feel free to ask for any clarifications.

    import pandas as pd
    import numpy as np
    from sklearn.feature_selection import SelectFromModel, SelectKBest, f_regression
    from sklearn.datasets import load_iris
    
    
    def feature_selection(data):
        y = data[1]
        X = data[0]
        column_names = ["A", "B", "C", "D"]  # Here you should use your dataframe's column names
        k = 2
    
        fs = SelectKBest(score_func=f_regression, k=k)
    
        # Applying feature selection
        X_selected = fs.fit_transform(X, y)
    
        # Find top features 
        # I create a list like [[ColumnName1, Score1] , [ColumnName2, Score2], ...]
        # Then I sort in descending order on the score
        top_features = sorted(zip(column_names, fs.scores_), key=lambda x: x[1], reverse=True)
        print(top_features[:k])
    
        return X_selected
    
    
    data = load_iris(return_X_y=True)
    new_data = feature_selection(data)