With sklearn 0.22 the categorical_features argument will be removed, thus the following code is not executable anymore:
import numpy as np
from sklearn.preprocessing import OneHotEncoder
X = np.array([[1, 1], [2, 2], [1, 3]])
encoder = OneHotEncoder(categorical_features=[1], sparse=False)
print(encoder.fit_transform(X))
The question is, how do I achieve the same behavior as in the code above using the categories argument, since OneHotEncoder(categories=[[1, 2], [1, 2, 3]], sparse=False)
would also encode the first column and OneHotEncoder(categories=[[1, 2, 3]], sparse=False)
throws an Error
OK, so basically you would like to one-hot encode the second column [1,2,3] and keep the first column [1,2,1] as pass through. In newer sklearn versions, you may use ColumnTransformer to combine different preprocessing procedures like this:
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
X = np.array([[1, 1], [2, 2], [1, 3]])
encoder = ColumnTransformer(
[('number1', OneHotEncoder(dtype='int'), [1])],
remainder="passthrough"
)
print(encoder.fit_transform(X))
Then you don't have to specify the value range with categories. Refer to the documentation for further details.