I am trying to use kFold on an XGBoost regression problem. A sample of the data is this:
When I use the following code:
df = pd.read_csv('../data/df_samp.csv').head(1000)
cat_columns = ['primary_use','meter','hour','weekday','month','wind_compass']
df_processed = pd.get_dummies(df, prefix_sep="_", columns=cat_columns)
X=df_processed.drop(['meter_reading','outlier_ratio','meter_reading_roll_avg','timestamp'],axis=1)
y=df_processed['meter_reading']
scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, shuffle=False)
for train_index, test_index in cv.split(X):
print("Train Index: ", train_index, "\n")
print("Test Index: ", test_index)
X_train, X_test, y_train, y_test = X.values[train_index], X.values[test_index], y.values[train_index], y.values[test_index]
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
predictions = [round(value) for value in y_pred]
scores.append(r2_score(y_test,predictions))
I get the output
print(scores)
[0.406908684278529, 0.3320925821156784, 0.1039843686445262, 0.395466094618815, 0.13412072574647682, -0.015579242639622182, -0.17008382837529967, 0.3931056789610018, 0.4491969042604125, 0.49641651402527265]
When I try
scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, random_state=42, shuffle=False)
cross_val_score(model, X.values, y.values, cv=10)
I get
ValueError: continuous is not supported
Does anybody know why?
Thank you
Thank you MrSoLoDolo for your suggestion.
I needed to use XGBRegression()
instead of XGBClassifier()