Search code examples
pythonpandasnumpyscikit-learndimensionality-reduction

Filter DataFrame after sklearn.feature_selection


I reduce dimensionality of a dataset (pandas DataFrame).

X = df.as_matrix()
sel = VarianceThreshold(threshold=0.1)
X_r = sel.fit_transform(X) 

then I wanto to get back the reduced DataFrame (i.e. keep only ok columns)

I found only this ugly way to do so, which is very inefficient, do you have any cleaner idea?

    cols_OK = sel.get_support()  # which columns are OK?
    c = list()
    for i, col in enumerate(cols_OK):
        if col:
            c.append(df.columns[i])
    return df[c]

Solution

  • I think you need if return mask:

    cols_OK = sel.get_support()
    df = df.loc[:, cols_OK]
    

    and if return indices:

    cols_OK = sel.get_support()
    df = df.iloc[:, cols_OK]