I am training an LGBMClassifier
for the purpose of using its predict_proba
method. The target has 3 classes: a, b, and c. I want to ensure that the model predict_proba
outputs the probabilities of the columns in order b, a, c.
Is there a way to ensure that the output of LGBMClassifier
predict_proba
has the above ordering?
import pandas as pd
from lightgbm import LGBMClassifier
import numpy as np
#data
features = ['feat_1']
TARGET = 'target'
df = pd.DataFrame({
'feat_1':np.random.uniform(size=100),
'target':np.random.choice(a=['b','c','a'], size=100)
})
#training
model = LGBMClassifier()
model.fit(df[features], df[TARGET])
print(model.classes_)
['a','b','c']
.classes_
attribute.model.classes_ = ['b','a','c']
AttributeError: can't set attribute 'classes_'
.classes_
attribute.desired_order = ['b','a','c']
correct_idx = [list(model._classes).index(val) for val in desired_order]
model.predict_proba(test[features])[:, correct_idx]
#2 works, but I would just as soon not have to permute the column order every predict_proba
call.
Originally, the labels come in order ["a", "b", "c"] because the SkLearn framework sorts them lexicographically (ie. using numpy.unique()
). This behaviour cannot be disabled easily.
You may work around this by re-mapping labels from strings to integers:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.classes_ = np.asarray(["b", "a", "c"])
df[TARGET] = le.transform(df[TARGET])
The LGBMClassifier
will return a three-element array from its predict_proba
method, where the elements correspond to "b", "a" and "c" labels, respectively.
Sure, the downside is that the predict
method will be returning integer class labels also.