python machine-learning scikit-learn preprocessor one-hot-encoding

OneHotEncoder categories argument

With sklearn 0.22 the categorical_features argument will be removed, thus the following code is not executable anymore:

import numpy as np
from sklearn.preprocessing import OneHotEncoder

X = np.array([[1, 1], [2, 2], [1, 3]])
encoder = OneHotEncoder(categorical_features=[1], sparse=False)

print(encoder.fit_transform(X))

The question is, how do I achieve the same behavior as in the code above using the categories argument, since OneHotEncoder(categories=[[1, 2], [1, 2, 3]], sparse=False) would also encode the first column and OneHotEncoder(categories=[[1, 2, 3]], sparse=False) throws an Error

Solution

OK, so basically you would like to one-hot encode the second column [1,2,3] and keep the first column [1,2,1] as pass through. In newer sklearn versions, you may use ColumnTransformer to combine different preprocessing procedures like this:

import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

X = np.array([[1, 1], [2, 2], [1, 3]])
encoder = ColumnTransformer(
    [('number1', OneHotEncoder(dtype='int'), [1])],
    remainder="passthrough"
)

print(encoder.fit_transform(X))

Then you don't have to specify the value range with categories. Refer to the documentation for further details.

ColumnTransformer