Not too sure what the reason for this error is:
RuntimeWarning: invalid value encountered in true_divide
msw = sswn / float(dfwn)
When used with the following:
import io
import pandas as pd
from sklearn import model_selection
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
df = pd.read_csv(
io.StringIO(
"x0,x1,y\n10.354468012163927,7.655143584899129,168.06121374114608\n8.786243147880384,6.244283164157256,156.570749155167\n10.450548129254543,8.084427493431185,152.10261405911672\n10.869778308219216,9.165630427431644,129.72126680171317\n11.236593954599316,5.7987616455741575,55.294961794556315\n9.111226379916955,10.289447419679227,308.7475968288771\n9.753313270715008,9.803181441185592,163.337342478704\n9.752270042969856,9.004988677803736,271.9442757290742\n8.67161845864426,9.801711898528824,158.09622149503954\n8.830913103331573,6.632544281651334,316.23912914041557\n"
)
)
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop("y", axis=1),
df["y"],
test_size=0.2,
)
X_new = SelectKBest(f_classif, k=2).fit_transform(X_train, y_train)
You are using a selector for classification, but as I can see your problem is a regression problem
x0 x1 y
0 10.354468 7.655144 168.061214
1 8.786243 6.244283 156.570749
2 10.450548 8.084427 152.102614
3 10.869778 9.165630 129.721267
4 11.236594 5.798762 55.294962
5 9.111226 10.289447 308.747597
6 9.753313 9.803181 163.337342
7 9.752270 9.004989 271.944276
8 8.671618 9.801712 158.096221
9 8.830913 6.632544 316.239129
The label y
is a float value, not a class.
Instead of f_classif
try these two
from sklearn.feature_selection import f_regression
from sklearn.feature_selection import mutual_info_regression
X_new = SelectKBest(f_regression, k=2).fit_transform(X_train, y_train)
X_new = SelectKBest(mutual_info_regression, k=2).fit_transform(X_train, y_train)