Search code examples
pythonmachine-learningscikit-learnnlplightgbm

LightGBM on Numerical+Categorical+Text Features >> TypeError: Unknown type of parameter:boosting_type, got:dict


Im trying to train a lightGBM model on a dataset consisting of numerical, Categorical and Textual data. However, during the training phase, i get the following error:

params = {
'num_class':5,
'max_depth':8,
'num_leaves':200,
'learning_rate': 0.05,
'n_estimators':500
}

clf = LGBMClassifier(params)
data_processor = ColumnTransformer([
    ('numerical_processing', numerical_processor, numerical_features),
    ('categorical_processing', categorical_processor, categorical_features),
    ('text_processing_0', text_processor_1, text_features[0]),
    ('text_processing_1', text_processor_1, text_features[1])
                                    ]) 
pipeline = Pipeline([
    ('data_processing', data_processor),
    ('lgbm', clf)
                    ])
pipeline.fit(X_train, y_train)

and the error is:

TypeError: Unknown type of parameter:boosting_type, got:dict

Here's my pipeline: enter image description here

I basically have two textual features, both are some form of names on which im performing stemming mainly .

Any pointers would be highly appreciated.


Solution

  • You are setting up the classifier wrongly, this is giving you the error and you can easily try this before going to the pipeline:

    params = {
    'num_class':5,
    'max_depth':8,
    'num_leaves':200,
    'learning_rate': 0.05,
    'n_estimators':500
    }
    
    clf = LGBMClassifier(params)
    clf.fit(np.random.uniform(0,1,(50,2)),np.random.randint(0,5,50))
    

    Gives you the same error:

    TypeError: Unknown type of parameter:boosting_type, got:dict
    

    You can set up the classifier like this:

    clf = LGBMClassifier(**params)
    

    Then using an example, you can see it runs:

    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler, OneHotEncoder
    from sklearn.compose import ColumnTransformer
    
    numerical_processor = StandardScaler()
    categorical_processor = OneHotEncoder()
    numerical_features = ['A']
    categorical_features = ['B']
    
    data_processor = ColumnTransformer([('numerical_processing', numerical_processor, numerical_features),
    ('categorical_processing', categorical_processor, categorical_features)])
    
    X_train = pd.DataFrame({'A':np.random.uniform(100),
    'B':np.random.choice(['j','k'],100)})
    
    y_train = np.random.randint(0,5,100)
    
    pipeline = Pipeline([('data_processing', data_processor),('lgbm', clf)])
    
    pipeline.fit(X_train, y_train)