I am using lightbgm to predict a binary: classification problem. But I cannot get a binary prediction[0;1]
From the document, I thought the parameter-objective : binary is for binary prediction and cross-entropy is for probability prediction.
d_train = lgb.Dataset(train_X[features], label=train_y,categorical_feature=Cat_columns)
d_valid = lgb.Dataset(val_X[features], label=val_y,categorical_feature=Cat_columns)
params = {
'objective':'binary',
'boosting':'goss',
'metric': 'binary_error',
'learning_rate': 0.1,
'num_leaves': 31,
'max_depth': 9,
'min_data_in_leaf': 20,
'max_delta_step': 0,
'device_type':'cpu',
'verbosity':1}
Model2 = lgb.train(params, d_train,categorical_feature=Cat_columns, num_boost_round =10, valid_sets=[d_train,d_valid],feval=None,early_stopping_rounds=50)
Model2.predict(train_X[features])
array([0.00510775, 0.00510775, 0.00510775, ..., 0.00510775, 0.00510775,
0.0319719 ])
The model always sent me a probability array, I cannot find any setting to get a binary prediction.
Based on the LightGBM documentation, I don't think you can get predicted classes directly from LightGBM. The default prediction is, of course, predicted probabilities.
You can convert the probabilities to classes using a threshold. Now, deciding a threshold is tricky, and depends upon the nature of the problem you're solving, and the level of imbalance in the training data.
For a highly imbalanced dataset, using the standard 0.5 as the threshold won't be correct.
Now, coming back to your original doubt, setting the parameter objective:'binary', only informs the model about the kind of problem, which in this case in Binary classification. This would require your target variables to be {0,1}.
For more information regarding LightGBM parameters refer to the following document.
Hope this helps.
Have a good day.