Search code examples
pythoncatboost

Catboost CTR settings


I use catboost for a multiclassification task, with categorical data. I was checking the default parameter for ctr, the transformation from categorical to numerical data. It is said that the default value for ctr is "None".

As I understood it is an optional step.

The algo, did work on my dataset, so I was wondering if it use :

  • the Gradient Boosting properties to understand the categorical data
  • or does it actually use a default method from Borders, Buckets, BinarizedTargetMeanValue, Counter
  • or does it use the formula given in the example avg_target = (countInclass + prior) / (totalcount +1) by default which looks like "Buckets"

Solution

  • In Multiclass CatBoost uses Buckets method for calculating ctrs.

    The formula that you have written is correct. A separate ctr feature is calculated for each class. Here the countInClass is the count of objects with the same category value before given one in random permutation, that have this class value. And the totalCount is the number of objects before given with same category value that have any class value.