python machine-learning scikit-learn data-science one-hot-encoding

ValueError: Shape mismatch: if categories is an array, The error is not resolved even after specifying the columns as indexes

    trf1=ColumnTransformer([("Infuse_val",SimpleImputer(strategy="mean"),[0])],remainder="passthrough")
    trf4=ColumnTransformer([("One_hot",OneHotEncoder(sparse=False,handle_unknown="ignore"),[1,4])],remainder="passthrough")
    trf2=ColumnTransformer([("Ord_encode",OrdinalEncoder(categories=["Strong","Mild"]),[3])],remainder="passthrough")
    trf3=ColumnTransformer([("scale",StandardScaler(),[0,2])],remainder="passthrough")
    pipe = Pipeline([
        ('trf1',trf1),
        ('trf2',trf2),
        ('trf3',trf3),
        ('trf4',trf4),
    ])
    pipe.fit(x_train,y_tarin)

Error

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

The table is

I don't understand what's the error here in my code?

Solution

The error isn't about the column transformers, it's about the OrdinalEncoder. categories needs to be a list of lists: for each column, the list of categories in that column. Since you have just one column, categories=[["Strong","Mild"]] should work.

With just two categories, most subsequent algorithms won't care which one is 0 or 1, so here you could just use the default auto.

Finally, you'll have problems with your column transformers. The change the order (and names) of the columns, so by the end of the pipeline, scaling columns 0 and 2 might not be the two numeric columns. The column order is predictable (transformers in order followed by passthrough), so you could manually keep track. But I would suggest a single column transformer with multiple pipelines instead.