Search code examples
pythonmachine-learninglightgbm

Rearranging LGBMClassifier predict_proba outputs columns


I am training an LGBMClassifier for the purpose of using its predict_proba method. The target has 3 classes: a, b, and c. I want to ensure that the model predict_proba outputs the probabilities of the columns in order b, a, c.
Is there a way to ensure that the output of LGBMClassifier predict_proba has the above ordering?

import pandas as pd
from lightgbm import LGBMClassifier
import numpy as np

#data
features = ['feat_1']
TARGET = 'target'
df = pd.DataFrame({
    'feat_1':np.random.uniform(size=100),
    'target':np.random.choice(a=['b','c','a'], size=100)
})

#training
model = LGBMClassifier()
model.fit(df[features], df[TARGET])
print(model.classes_)

['a','b','c']

Things I Have Tried

  1. Just rearrange the .classes_ attribute.
    model.classes_ = ['b','a','c']

AttributeError: can't set attribute 'classes_'

  1. Manually rearrange the columns based on the .classes_ attribute.
desired_order = ['b','a','c']
correct_idx = [list(model._classes).index(val) for val in desired_order]
model.predict_proba(test[features])[:, correct_idx]

#2 works, but I would just as soon not have to permute the column order every predict_proba call.


Solution

  • Originally, the labels come in order ["a", "b", "c"] because the SkLearn framework sorts them lexicographically (ie. using numpy.unique()). This behaviour cannot be disabled easily.

    You may work around this by re-mapping labels from strings to integers:

    from sklearn.preprocessing import LabelEncoder
    
    le = LabelEncoder()
    le.classes_ = np.asarray(["b", "a", "c"])
    
    df[TARGET] = le.transform(df[TARGET])
    

    The LGBMClassifier will return a three-element array from its predict_proba method, where the elements correspond to "b", "a" and "c" labels, respectively.

    Sure, the downside is that the predict method will be returning integer class labels also.