Search code examples
pythonmachine-learningscikit-learnfeature-selection

Sklearn : invalid value encountered in true_div, when using SelectKBest(f_classif,... )


Not too sure what the reason for this error is:

RuntimeWarning: invalid value encountered in true_divide
  msw = sswn / float(dfwn)

When used with the following:

import io
import pandas as pd
from sklearn import model_selection
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

df = pd.read_csv(
    io.StringIO(
        "x0,x1,y\n10.354468012163927,7.655143584899129,168.06121374114608\n8.786243147880384,6.244283164157256,156.570749155167\n10.450548129254543,8.084427493431185,152.10261405911672\n10.869778308219216,9.165630427431644,129.72126680171317\n11.236593954599316,5.7987616455741575,55.294961794556315\n9.111226379916955,10.289447419679227,308.7475968288771\n9.753313270715008,9.803181441185592,163.337342478704\n9.752270042969856,9.004988677803736,271.9442757290742\n8.67161845864426,9.801711898528824,158.09622149503954\n8.830913103331573,6.632544281651334,316.23912914041557\n"
    )
)
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop("y", axis=1),
    df["y"],
    test_size=0.2,
)

X_new = SelectKBest(f_classif, k=2).fit_transform(X_train, y_train)

Solution

  • You are using a selector for classification, but as I can see your problem is a regression problem

        x0          x1          y
    0   10.354468   7.655144    168.061214
    1   8.786243    6.244283    156.570749
    2   10.450548   8.084427    152.102614
    3   10.869778   9.165630    129.721267
    4   11.236594   5.798762    55.294962
    5   9.111226    10.289447   308.747597
    6   9.753313    9.803181    163.337342
    7   9.752270    9.004989    271.944276
    8   8.671618    9.801712    158.096221
    9   8.830913    6.632544    316.239129
    

    The label y is a float value, not a class.

    Instead of f_classif try these two

    from sklearn.feature_selection import f_regression
    from sklearn.feature_selection import mutual_info_regression
    
    X_new = SelectKBest(f_regression, k=2).fit_transform(X_train, y_train)
    X_new = SelectKBest(mutual_info_regression, k=2).fit_transform(X_train, y_train)