Every iteration, the train loss is increasing.
Iter Train Loss Remaining Time
1 5313.1014 22.51s
2 5170.8669 21.21s
3 1641863.7866 20.05s
4 1640770.5703 18.86s
5 1695332.9514 17.62s
6 1689162.9816 16.42s
7 1689562.3732 15.26s
8 1803110.9519 14.08s
9 1801803.5873 12.94s
10 2274529.9750 11.77s
11 17589338.0388 10.59s
12 1121779686.7875 10.03s
13 1071057062185277527192667544912333682394851905403317706031104.0000
14 1071057062185277527192667544912333682394851905403317706031104.0000
15 1071057062185277527192667544912333682394851905403317706031104.0000
16 1071057062185277527192667544912333682394851905403317706031104.0000
17 1071057062185277527192667544912333682394851905403317706031104.0000
18 1071057062185277527192667544912333682394851905403317706031104.0000
19 1071057062185277527192667544912333682394851905403317706031104.0000
20 1071057062185277527192667544912333682394851905403317706031104.0000
My input is a large matrix of 0s and 1s (vectorized words, as a sparse matrix), and my targets are integers:
array([131, 64, 64, 134, 32, 50, 42, 154, 124, 29, 64, 154, 137,
64, 64, 64, 89, 16, 125, 64])
Perhaps there's something wrong with my code, but I doubt it. Here it is:
xgboost = GradientBoostingClassifier(n_estimators=20,
min_samples_leaf=2,
min_samples_split=3,
verbose=10, max_features=20)
xgboost.fit(xtrain, ytrain)
My input shapes are:
<1544x19617 sparse matrix of type '<class 'numpy.int64'>'
with 202552 stored elements in Compressed Sparse Row format>
When the training loss suddenly explodes, it's sometimes because of getting stuck in a degenerate solution space. It's possible that reducing the learning rate may help (and appears to, in this case). In gradient boosting, the learning rate affects the impact of each successive tree on the existing predictions. By reducing the learning rate, the ability of any one tree to radically alter the overall predictions is lower, which can help avoid unexpectedly ending up in degenerate solution spaces.