python machine-learning scikit-learn xgboost

Gradient boosting training loss increases at every iteration

Every iteration, the train loss is increasing.

      Iter       Train Loss   Remaining Time 
         1        5313.1014           22.51s
         2        5170.8669           21.21s
         3     1641863.7866           20.05s
         4     1640770.5703           18.86s
         5     1695332.9514           17.62s
         6     1689162.9816           16.42s
         7     1689562.3732           15.26s
         8     1803110.9519           14.08s
         9     1801803.5873           12.94s
        10     2274529.9750           11.77s
        11    17589338.0388           10.59s
        12  1121779686.7875           10.03s
        13 1071057062185277527192667544912333682394851905403317706031104.0000            
        14 1071057062185277527192667544912333682394851905403317706031104.0000            
        15 1071057062185277527192667544912333682394851905403317706031104.0000            
        16 1071057062185277527192667544912333682394851905403317706031104.0000            
        17 1071057062185277527192667544912333682394851905403317706031104.0000            
        18 1071057062185277527192667544912333682394851905403317706031104.0000            
        19 1071057062185277527192667544912333682394851905403317706031104.0000            
        20 1071057062185277527192667544912333682394851905403317706031104.0000

My input is a large matrix of 0s and 1s (vectorized words, as a sparse matrix), and my targets are integers:

array([131,  64,  64, 134,  32,  50,  42, 154, 124,  29,  64, 154, 137,
        64,  64,  64,  89,  16, 125,  64])

Perhaps there's something wrong with my code, but I doubt it. Here it is:

xgboost = GradientBoostingClassifier(n_estimators=20, 
                                     min_samples_leaf=2, 
                                     min_samples_split=3,
                                     verbose=10, max_features=20)
xgboost.fit(xtrain, ytrain)

My input shapes are:

<1544x19617 sparse matrix of type '<class 'numpy.int64'>'
    with 202552 stored elements in Compressed Sparse Row format>

Solution

When the training loss suddenly explodes, it's sometimes because of getting stuck in a degenerate solution space. It's possible that reducing the learning rate may help (and appears to, in this case). In gradient boosting, the learning rate affects the impact of each successive tree on the existing predictions. By reducing the learning rate, the ability of any one tree to radically alter the overall predictions is lower, which can help avoid unexpectedly ending up in degenerate solution spaces.