Search code examples
catboost

How to fetch the evaluation metric after a CatBoostClassifier.fit()?


I have trained a classification model calling CatBoostClassifier.fit(), also providing an eval_set.

Now, how can I fetch the best value of the evaluation metric, and the number of iteration when it was achieved during training? I can plot the information by setting plot=True in the call to fit(), but how can I assign it to a variable?

I can do it when I train the model calling cv(), as cv() returns the wanted information. But CatBoostClassifier.fit() doesn't return anything, accordingly to the documentation.

Here the snippet of code I am using to fit the model:

model = CatBoostClassifier(
                           random_seed=42,
                           logging_level='Silent',
                           eval_metric='Accuracy'
                          )

model.fit(X_train,
          y_train,
          cat_features=cat_features_idxs,
          eval_set=(X_val, y_val),
          plot=True
         )

Here how I manage to fetch the wanted information, if I use cv() instead:

cv_data = cv(Pool(X, y, cat_features = cat_features_idxs),
             model.get_params(),
             fold_count = 5,
             plot=True)

print('Validation accuracy (best average among cross-validation folds) is {} obtained at step {}.'.format(np.max(cv_data['test-Accuracy-mean']), np.argmax(cv_data['test-Accuracy-mean'])))

Solution

  • 1) Just compute the score on the training data:

    https://stackoverflow.com/a/17954831

    model = CatBoostClassifier(
                           random_seed=42,
                           logging_level='Silent',
                           eval_metric='Accuracy'
                          )
    
    model.fit(X_train,
              y_train,
              cat_features=cat_features_idxs,
              eval_set=(X_val, y_val),
              plot=True
             )
    
    train_score = model.score(X_train, y_train) # train (learn) score
    
    val_score = model.score(X_val, y_val) # val (test) score
    

    Another way would be accessing the output files:

    model = CatBoostClassifier(
                           random_seed=42,
                           logging_level='Silent',
                           eval_metric='Accuracy',
                           allow_writing_files=True
                          )
    
    model.fit(X_train,
          y_train,
          cat_features=cat_features_idxs,
          eval_set=(X_val, y_val),
          plot=True
         )
    
    import pandas as pd
    test_error = pd.read_csv('catboost_info/test_error.tsv', sep='\t')
    val_score = test_error.loc[test_error['Accuracy'] == test_error['Accuracy'].max()]['Accuracy'].values[0]
    best_iter = int(test_error.loc[test_error['Accuracy'] == test_error['Accuracy'].min()]['iter'].values[0])
    train_score = learn_error.loc[learn_error['iter'] == best_iter]['Accuracy'].values[0]
    

    2) If you have pandas installed add as_pandas=True as a parameter of cv, then you can access cv_data as a Dataframe. e.g. cv_data['test-Accuracy-mean'].max().

    https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_cv-docpage/

    You could also access the output files as above, in this case there will be a pair of folders for each fold.

    Hope this helps!