Search code examples
pythondata-sciencexgboost

Shape Mismatch XGBoost Regressor


I have trained an XGBoost Regressor model on data that has a different shape to the test data I intend to predict on. Is there a way to go around this or a model that can tolerate feature mismatches?

The input training data and test data got mismatched during One Hot Encoding of categorical features.

best_xgb = xgb.XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.05, max_delta_step=0,
             max_depth=6, min_child_weight=10,monotone_constraints='()', n_estimators=400, n_jobs=4,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

best_xgb.fit(X, y)

best_xgb.predict(test_data)

I get the following error: Shape Mismatch Error


Solution

  • Please check where 249-235=14 features are in test data.
    Or fit on same data

    best_xgb.fit(X[test_data.columns], y)