python data-science xgboost feature-selection information-gain

What should I do in case I have dominant feature in XGB model?

I've recently faced a "strange" observation in my dataset. After XGB modeling with 20 features I plot top 10 features with the highest gain values. Result is shown below:

F1         140027.061202
F2          11242.470370
F3           9957.161039
F4           9677.070632
F5           7103.275865
F6           4691.814929
F7           4030.730915
F8           2775.235616
F9           2384.573760
F10          2328.680871

As you can see F1 dominates with gain compared to all other features (12x more gain than F2). I verified the results on test set, model is not overfitting and it gives a decent results (comparing to my figures of merit):

F1-score: 0.739812237993 
Accuracy: 0.839632893701 
Precision: 0.63759578607
Recall: 0.881059718486

Based on these results is it correct to conclude that F1 feature is enough for building a model?

In order to prove this, I re-run the modeling with the same parameters, but now having F1 as a standalone feature. Results are just slightly worse than previous (and no over-fit):

F1-score: 0.710906846703 
Accuracy: 0.819880412472 
Precision: 0.607953806173
Recall: 0.85583736242

My XGB parameters are super simple in both cases:

alg = XGBRegressor( 
    n_estimators=200,
    max_depth=5,
    objective='binary:logistic', 
    seed=27,
)

# Fit the algorithm on the data
metric = 'map'
alg.fit(X_train, y_train, eval_metric=metric)

After I exclude feature F1 and re-fit the model I get the similar verification metrics (slightly worse) but in that case feature F3 becomes "dominant" with a really high gain ~ 10000 while feature F2 is the next one with gain value ~ 10000.

Is there any technique to boost other features and increase the accuracy (or F1-score) by applying certain transformations on F1?
Is there any way to equalize features gain and improve the model?
Should I try to "collect" more features that can have comparable gain as F1?

Thanks!

Solution

Have you tried adding and tuning additional parameters and using grid search to find the optimal combination? To prevent over fitting I can suggest adding:

colsample_bytree: subsample ratio of columns when constructing each tree
subsample: Subsample ratio of the training instances
min_child_weight: prevent learning relations highly specific to the particular sample

Since you are using XGBRegressor, try modifying the objective function. I can also suggest monitoring the validation and training loss when building the trees.

Reference documentation