I'm going to train this as an xgboost model.
'start_time','end_time' column was in yyyy-mm-dd hh:mm:ss format.
I changed it to string using astype(str) and changed it to yyyymmddhhmmss format using regular expressions.
xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree',
metric='multi:softmax')
hr_pred = xgb_model.fit(x_train, np.ravel(y_train, order='C')).predict(x_test)
print(classification_report(y_test, hr_pred))
But this kind of error occurred and I've never seen like this before.
ValueError: DataFrame.dtypes for data must be int, float, bool or categorical. When categorical type is supplied, DMatrix parameter `enable_categorical` must be set to `True`.start_time, end_time
how can I solve this problem?
Thanks for your help.
It seems that you have categorial data. Start_time
and end_time
are object type.
You need either to drop them or to encode them.
To drop them
xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree',
metric='multi:softmax')
hr_pred = xgb_model.fit(x_train._get_numeric_data(), np.ravel(y_train, order='C')).predict(x_test._get_numeric_data())
print(classification_report(y_test, hr_pred))
To encode them have a look at this library https://contrib.scikit-learn.org/category_encoders/