I want to create a pipeline structure that contains all the processes in the model training process. After making the relevant libraries and definitions, I created the following structure to experiment. I used telco churn dataset.
ohe_f =["gender","SeniorCitizen","Partner","Dependents","PhoneService","MultipleLines",
X_train, X_test, y_train, y_test = train_test_split(X,
pipeline = Pipeline(steps = [['smote', SMOTE(random_state=11)],
['scaler', MinMaxScaler()],
['encoder', OneHotEncoder(),ohe_f],
['classifier', LogisticRegression(random_state=11)]])
stratified_kfold = StratifiedKFold(n_splits=3,
param_grid = {'classifier__C':[0.01, 0.1, 1, 10, 100]}
grid_search = GridSearchCV(estimator=pipeline,
When I start training the model I get the following error. How can I solve it?
_RemoteTraceback Traceback (most recent call last)
Traceback (most recent call last):
ValueError: dictionary update sequence element #2 has length 3; 2 is required
Your need to split your pipeline into 2 parts : one to process the numeric features (with the min max scaler) and another one to process categorical features (with the one hot encoder). You can use the class ColumnTransformer
from scikit-learn : https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html