Search code examples
pythonmachine-learningscikit-learnone-hot-encoding

How to get number of dimensions in OneHotEncoder in Scikit-learn


I am using the OneHotEncoder from Scikit-learn in my project. And I need to know what would be the size of each one-hot vector when the n_value is set to be auto. I thought n_value_ would show that but it seems I have no way other than trying out training samples. I made this toy example code to show the problem. Do you know any other solution?

from sklearn.preprocessing import OneHotEncoder

data = [[1], [3], [5]] # 3 different features

encoder = OneHotEncoder()
encoder.fit(data)

print(len(encoder.transform([data[0]]).toarray()[0])) # 3 number of dimensions in one-hot-vector
print(encoder.n_values_) # [6] == len(range(5))

Solution

  • Is this what you are looking for?

    >>> encoder.active_features_
    array([1, 3, 5])
    
    >>> len(encoder.active_features_)
    3