xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.01,max_depth = 6, reg_alpha = 15, n_estimators = 1000, subsample = 0.5)
xg_reg_1 = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=300)
Also, if I used booster object in SelectfromModel, it throws an error.Kindly let me know the changes to be made to the code.
xgb_fea_imp=pd.DataFrame(list(xg_reg_1.get_fscore().items()),columns=['feature','importance']).sort_values('importance', ascending=False)
threshold1 = xgb_fea_imp.T.to_numpy()
from sklearn.feature_selection import SelectFromModel
# select the features
selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
feature_idx = selection.get_support()
feature_name = X.columns[feature_idx]
selected_dataset = selection.transform(X)
selected_dataset = pd.DataFrame(selected_dataset)
selected_dataset.columns = feature_name
The error is as follows :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-b089dd085f01> in <module>
4 selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
5
----> 6 feature_idx = selection.get_support()
7 feature_name = X.columns[feature_idx]
8 #print(feature_idx)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in get_support(self, indices)
50 values are indices into the input feature vector.
51 """
---> 52 mask = self._get_support_mask()
53 return mask if not indices else np.where(mask)[0]
54
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_from_model.py in _get_support_mask(self)
186 ' "prefit=True" while passing the fitted'
187 ' estimator to the constructor.')
--> 188 scores = _get_feature_importances(
189 estimator=estimator, getter=self.importance_getter,
190 transform_func='norm', norm_order=self.norm_order)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in _get_feature_importances(estimator, getter, transform_func, norm_order)
171 getter = attrgetter('feature_importances_')
172 else:
--> 173 raise ValueError(
174 f"when `importance_getter=='auto'`, the underlying "
175 f"estimator {estimator.__class__.__name__} should have "
ValueError: when `importance_getter=='auto'`, the underlying estimator Booster should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
If I then move ahead, and state Prefit=False, it asks to fit the model before using.
You shouldn't build the xgboost
regression model using its core API. The train
function of xgb
returns a Booster
object, which does not have coef_
or feature_importances_
attributes. Use xgb.XGBRegressor
which is compatible with Sklearn and has feature_importances_
that can be used inside SelectFromModel
.