Search code examples
loss-functionmulticlass-classificationmlogitcatboost

Mlogloss value in catboost starts negative and increases


I am running catboost classifier with catboost using settings:

model = CatBoostClassifier(iterations=1000, learning_rate=0.05, depth=7, loss_function='MultiClass',calc_feature_importance=True)

I have 5 classes and it starts from -ve values and increases as below while fitting model:

0:      learn: -1.5036342       test: -1.5039740        best: -1.5039740 (0)    total: 18s      remaining: 4h 59m 46s
1:      learn: -1.4185548       test: -1.4191364        best: -1.4191364 (1)    total: 37.8s    remaining: 5h 14m 24s
2:      learn: -1.3475387       test: -1.3482641        best: -1.3482641 (2)    total: 56.3s    remaining: 5h 12m 1s
3:      learn: -1.2868831       test: -1.2877465        best: -1.2877465 (3)    total: 1m 15s   remaining: 5h 12m 32s
4:      learn: -1.2342138       test: -1.2351585        best: -1.2351585 (4)    total: 1m 34s   remaining: 5h 13m 56s

Is this normal behaviour? While in most of the machine learning algorithms, logloss is positive and decreases with training. What am I missing here?


Solution

  • Yes, this is normal behaviour.

    When you specify loss_function='MultiClass' in parameters of your model, it uses another loss function, not LogLoss, for optimisation. The definition can be found here.

    To understand the sign of that function, you can think of the best-case scenario and the worst-case scenario. In the best-case the value of the target function for object ai is all concentrated in the correct class t, so the fraction in the log (in the formula on the linked page) would equal 1, and the log would be 0. However, when you diverge from that best-case, the fraction in the log would decrease towards 0, and the log itself would get more and more negative.