I want to train a lgb model with custom metric : f1_score
with weighted
average.
I went through the advanced examples of lightgbm over here and found the implementation of custom binary error function. I implemented as similar function to return f1_score as shown below.
def f1_metric(preds, train_data):
labels = train_data.get_label()
return 'f1', f1_score(labels, preds, average='weighted'), True
I tried to train the model by passing feval
parameter as f1_metric
as shown below.
evals_results = {}
bst = lgb.train(params,
dtrain,
valid_sets= [dvalid],
valid_names=['valid'],
evals_result=evals_results,
num_boost_round=num_boost_round,
early_stopping_rounds=early_stopping_rounds,
verbose_eval=25,
feval=f1_metric)
Then I am getting ValueError: Found input variables with inconsistent numbers of samples:
The training set is being passed to the function rather than the validation set.
How can I configure such that the validation set is passed and f1_score is returned?
The docs are a bit confusing. When describing the signature of the function that you pass to feval, they call its parameters preds and train_data, which is a bit misleading.
But the following seems to work:
from sklearn.metrics import f1_score
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
evals_result = {}
clf = lgb.train(param, train_data, valid_sets=[val_data, train_data], valid_names=['val', 'train'], feval=lgb_f1_score, evals_result=evals_result)
lgb.plot_metric(evals_result, metric='f1')
To use more than one custom metric, define one overall custom metrics function just like above, in which you calculate all metrics and return a list of tuples.
Edit: Fixed code, of course with F1 bigger is better should be set to True.