Search code examples
rgradient-descentxgboost

Understanding xgb.dump


I'm trying to understand the intuition about what is going on in the xgb.dump of a binary classification with an interaction depth of 1. Specifically how the same split is used twiced in a row (f38 < 2.5) (code lines 2 and 6)

The resulting output looks like this:

 xgb.dump(model_2,with.stats=T) 
   [1] "booster[0]" 
   [2] "0:[f38<2.5] yes=1,no=2,missing=1,gain=173.793,cover=6317" 
   [3] "1:leaf=-0.0366182,cover=3279.75" 
   [4] "2:leaf=-0.0466305,cover=3037.25" 
   [5] "booster[1]" 
   [6] "0:[f38<2.5] yes=1,no=2,missing=1,gain=163.887,cover=6314.25" 
   [7]    "1:leaf=-0.035532,cover=3278.65" 
   [8] "2:leaf=-0.0452568,cover=3035.6"

Is the difference between the first use of f38 and the second use of f38 simply the residual fitting going on? At first it seemed weird to me, and trying to understand exactly what's going on here!

Thanks!


Solution

  • Is the difference between the first use of f38 and the second use of f38 simply the residual fitting going on?

    most likely yes - its updating the gradient after the first round and finding the same feature with split point in your example

    Here's a reproducible example.

    Note how I lower the learning rate in the second example and its finds the same feature, same split point again for all three rounds. In the first example it uses different features in all 3 rounds.

    require(xgboost)
    data(agaricus.train, package='xgboost')
    train <- agaricus.train
    dtrain <- xgb.DMatrix(data = train$data, label=train$label)
    
    #high learning rate, finds different first split feature (f55,f28,f66) in each tree
    bst <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nrounds = 3,nthread = 2, objective = "binary:logistic")
    xgb.dump(model = bst)
    # [1] "booster[0]"                                 "0:[f28<-9.53674e-07] yes=1,no=2,missing=1" 
    # [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=1.71218"                            
    # [5] "4:leaf=-1.70044"                            "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
    # [7] "5:leaf=-1.94071"                            "6:leaf=1.85965"                            
    # [9] "booster[1]"                                 "0:[f59<-9.53674e-07] yes=1,no=2,missing=1" 
    # [11] "1:[f28<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=0.784718"                           
    # [13] "4:leaf=-0.96853"                            "2:leaf=-6.23624"                           
    # [15] "booster[2]"                                 "0:[f101<-9.53674e-07] yes=1,no=2,missing=1"
    # [17] "1:[f66<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=0.658725"                           
    # [19] "4:leaf=5.77229"                             "2:[f110<-9.53674e-07] yes=5,no=6,missing=5"
    # [21] "5:leaf=-0.791407"                           "6:leaf=-9.42142"      
    
    ## changed eta to lower learning rate, finds same feature(f55) in first split of each tree 
    bst2 <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = .01, nrounds = 3,nthread = 2, objective = "binary:logistic")
    xgb.dump(model = bst2)
    # [1] "booster[0]"                                 "0:[f28<-9.53674e-07] yes=1,no=2,missing=1" 
    # [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=0.0171218"                          
    # [5] "4:leaf=-0.0170044"                          "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
    # [7] "5:leaf=-0.0194071"                          "6:leaf=0.0185965"                          
    # [9] "booster[1]"                                 "0:[f28<-9.53674e-07] yes=1,no=2,missing=1" 
    # [11] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=0.016952"                           
    # [13] "4:leaf=-0.0168371"                          "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
    # [15] "5:leaf=-0.0192151"                          "6:leaf=0.0184251"                          
    # [17] "booster[2]"                                 "0:[f28<-9.53674e-07] yes=1,no=2,missing=1" 
    # [19] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3"  "3:leaf=0.0167863"                          
    # [21] "4:leaf=-0.0166737"                          "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
    # [23] "5:leaf=-0.0190286"                          "6:leaf=0.0182581"