Search code examples
pythonmachine-learninglightgbmoptuna

Optuna Lightgbm integration giving categorical features error


I'm creating a model using Optuna Lightgbm integration, My training set has some categorical features and I pass those features to the model using the lgb.Dataset class. Here is the code I'm using (NOTE: X_train, X_val, y_train, y_val are all pandas dataframes).

import lightgbm as lgb 

        grid = {
            
       
            'boosting': 'gbdt',
            'metric': ['huber', 'rmse' , 'mape'],
            'verbose':1

        }
        
        X_train, X_val, y_train, y_val = train_test_split(X, y)

        cat_features = [ col for col in X_train if col.startswith('cat') ]

        dval = Dataset(X_val, label=y_val, categorical_feature=cat_features)
        dtrain = Dataset(X_train, label=y_train,  categorical_feature=cat_features)
        
        model = lgb.train(      
                                    grid,
                                    dtrain,
                                    valid_sets=[dval],
                                    early_stopping_rounds=100)

Every time the lgb.train function is called, I get the following user warning:

UserWarning: categorical_column in param dict is overridden.

I believe that Lightgbm is not treating my categorical features the way it should. How can I fix this issue? Am I using the parameter correctly?


Solution

  • In case of picking the name (not indexes) of those columns, add as well the feature_name parameters as the documentation states

    That said, your dval and dtrain will be initialized as follow:

    dval = Dataset(X_val, label=y_val, feature_name=cat_features, categorical_feature=cat_features)
    dtrain = Dataset(X_train, label=y_train, feature_name=cat_features, categorical_feature=cat_features)