Search code examples

OneHotEncoder categories argument

With sklearn 0.22 the categorical_features argument will be removed, thus the following code is not executable anymore:

import numpy as np
from sklearn.preprocessing import OneHotEncoder

X = np.array([[1, 1], [2, 2], [1, 3]])
encoder = OneHotEncoder(categorical_features=[1], sparse=False)


The question is, how do I achieve the same behavior as in the code above using the categories argument, since OneHotEncoder(categories=[[1, 2], [1, 2, 3]], sparse=False) would also encode the first column and OneHotEncoder(categories=[[1, 2, 3]], sparse=False) throws an Error


  • OK, so basically you would like to one-hot encode the second column [1,2,3] and keep the first column [1,2,1] as pass through. In newer sklearn versions, you may use ColumnTransformer to combine different preprocessing procedures like this:

    import numpy as np
    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import OneHotEncoder
    X = np.array([[1, 1], [2, 2], [1, 3]])
    encoder = ColumnTransformer(
        [('number1', OneHotEncoder(dtype='int'), [1])],

    Then you don't have to specify the value range with categories. Refer to the documentation for further details.
