Search code examples
pythonlightgbm

LIhgtbgm produces non-binary predictions under binary objective?


I am using lightbgm to predict a binary: classification problem. But I cannot get a binary prediction[0;1]

From the document, I thought the parameter-objective : binary is for binary prediction and cross-entropy is for probability prediction.

d_train = lgb.Dataset(train_X[features], label=train_y,categorical_feature=Cat_columns)
d_valid = lgb.Dataset(val_X[features], label=val_y,categorical_feature=Cat_columns)

params = {
 'objective':'binary',
 'boosting':'goss',
 'metric': 'binary_error',
 'learning_rate': 0.1,
 'num_leaves': 31,
 'max_depth': 9,
 'min_data_in_leaf': 20,
 'max_delta_step': 0,
 'device_type':'cpu',
 'verbosity':1}

Model2 = lgb.train(params, d_train,categorical_feature=Cat_columns, num_boost_round =10, valid_sets=[d_train,d_valid],feval=None,early_stopping_rounds=50)

Model2.predict(train_X[features])

array([0.00510775, 0.00510775, 0.00510775, ..., 0.00510775, 0.00510775,
       0.0319719 ])

The model always sent me a probability array, I cannot find any setting to get a binary prediction.


Solution

  • Based on the LightGBM documentation, I don't think you can get predicted classes directly from LightGBM. The default prediction is, of course, predicted probabilities.

    You can convert the probabilities to classes using a threshold. Now, deciding a threshold is tricky, and depends upon the nature of the problem you're solving, and the level of imbalance in the training data.

    For a highly imbalanced dataset, using the standard 0.5 as the threshold won't be correct.

    Now, coming back to your original doubt, setting the parameter objective:'binary', only informs the model about the kind of problem, which in this case in Binary classification. This would require your target variables to be {0,1}.

    For more information regarding LightGBM parameters refer to the following document.

    Hope this helps.

    Have a good day.