Search code examples
pythonperformanceparametersxgboost

Python XGBoost GPU version underperforming accuracy of CPU version - parameters tuning


I'm developing a fraud detection model using XGBoost.

I cannot share the data (Sorry)

The CPU based model works well and identifies frauds as expected.

The GPU based model identifies a lower number of frauds.

So, given the same level of confidence the GPU based model identifies a much lower number of frauds.

This is the parameters list for the CPU:

params = {"objective":"multi:softprob", 
          'booster':'dart', 
          'max_depth':5, 
          'eta':0.1, 
          'subsample':0.2, 
          'nthread':mp.cpu_count()-1, 
          'eval_metric':'merror', 
          'colsample_bytree':0.2, 
          'num_class':2}

The parameters for the GPU model training are:

params = {"objective":"multi:softprob", 
          'subsample':0.2, 
          'gpu_id':0, 
          'num_class':2, 
          'tree_method':'gpu_hist', 
          'max_depth':5, 
          'eta':0.1, 
          'gamma':1100, 
          'eval_metric':'mlogloss'}

Solution

  • It is due to usage of different "tree" paramaters. Most probably it is using tree_method='exact' when using the CPU as you haven't given a tree parameter explicitly. You can test this by adding tree_method='exact' to your CPU params list and check whether you are getting a good accuracy same as without it. But you are using tree_method='gpu_hist' when using the GPU. You can find more information on all tree methods at here