Search code examples
pythonxgboost

I got this error 'DataFrame.dtypes for data must be int, float, bool or categorical'


I'm going to train this as an xgboost model.

enter image description here

'start_time','end_time' column was in yyyy-mm-dd hh:mm:ss format.

I changed it to string using astype(str) and changed it to yyyymmddhhmmss format using regular expressions.

xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree', 
                                  metric='multi:softmax')
hr_pred = xgb_model.fit(x_train, np.ravel(y_train, order='C')).predict(x_test)
print(classification_report(y_test, hr_pred))

But this kind of error occurred and I've never seen like this before.

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical.  When
            categorical type is supplied, DMatrix parameter
            `enable_categorical` must be set to `True`.start_time, end_time

how can I solve this problem?

Thanks for your help.


Solution

  • It seems that you have categorial data. Start_time and end_time are object type.

    You need either to drop them or to encode them.

    To drop them

    xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree', 
                                      metric='multi:softmax')
    hr_pred = xgb_model.fit(x_train._get_numeric_data(), np.ravel(y_train, order='C')).predict(x_test._get_numeric_data())
    print(classification_report(y_test, hr_pred))
    

    To encode them have a look at this library https://contrib.scikit-learn.org/category_encoders/