Search code examples
pythonscikit-learnpreprocessoroptbinning

Understanding BinningProcess output


I want to integrate optbinning.BinningProcess in my Pipeline but before, I need to understand the output. I was expecting bins after fit_transform but I obtained values close to my target variable.

Can someone explain the output to me, please? Or point out to me if I am doing something wrong.

Here is a reproducible example:

import pandas as pd
from random import choices
from random import uniform
from optbinning import BinningProcess

df = pd.DataFrame({'continuous_feature': choices(range(0,30), k=100),
                    'cat_feature': choices(['A', 'B', 'C'], k = 100),
                     'target' : [uniform(15,16) for x in range(0,100)]})
all_features = ["continuous_feature", "cat_feature"]

X = df.loc[:, all_features]
y = df.loc[:, 'target']

BinningProcess(all_features, categorical_variables= ['cat_feature']).fit_transform(X, y)

Solution

  • As mentioned by Nick ODell, for each bin, the value returned by the binning algorithm are "the average of the target variable for that bin". If you are willing to obtain the bins instead, use the option metric = "bins" in fit.transform(X,y).