Search code examples
pythonpandasplotxgboost

Get actual feature names from XGBoost model


I know this question has been asked several times and I've read them but still haven't been able to figure it out. Like other people, my feature names at the end are shown as f56, f234, f12 etc. and I want to have the actual names instead of f-somethings! This is the part of the code related to the model:

optimized_params, xgb_model = find_best_parameters() #where fitting and GridSearchCV happens
xgdmat = xgb.DMatrix(X_train_scaled, y_train_scaled)
feature_names=xgdmat.feature_names
final_gb = xgb.train(optimized_params, xgdmat, num_boost_round = 
                     find_optimal_num_trees(optimized_params,xgdmat)) 


final_gb.get_fscore()
mapper = {'f{0}'.format(i): v for i, v in enumerate(xgdmat.feature_names)}
mapped = {mapper[k]: v for k, v in final_gb.get_fscore().items()}
mapped
xgb.plot_importance(mapped, color='red')   

I also tried this:

feature_important = final_gb.get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

but still the features are shown as f+number. I'd really appreciate any help.

What I'm doing at the moment is to get the number at the end of fs, like 234 from f234 and use it in X_train.columns[234] to see what the actual name was. However, I'm having second thoughts as the name I'm getting this way is the actual feature f234 represents.


Solution

  • First make a dictionary from your original features and map them back to feature names.

    # create dict to use later
    myfeatures = X_train_scaled.columns
    dict_features = dict(enumerate(myfeatures))
    
    # feat importance with names f1,f2,...
    axsub = xgb.plot_importance(final_gb )
    
    # get the original names back
    Text_yticklabels = list(axsub.get_yticklabels())
    dict_features = dict(enumerate(myfeatures))
    lst_yticklabels = [ Text_yticklabels[i].get_text().lstrip('f') for i in range(len(Text_yticklabels))]
    lst_yticklabels = [ dict_features[int(i)] for i in lst_yticklabels]
    
    axsub.set_yticklabels(lst_yticklabels)
    print(dict_features)
    plt.show()
    

    Here is the example how it works: enter image description here