Search code examples
pythonsklearn-pandas

Receiving a value error when using OneHotEncoder and fitting data


I'm working on an assignment and we are using OneHotEncoder in scikit-learn to make all categories print out. Here is the a sample of the data and the code I used to transform it:

      grade sub_grade  short_emp  emp_length_num home_ownership        term
0          B        B2          0              11           RENT   36 months
1          C        C4          1               1           RENT   60 months
2          C        C5          0              11           RENT   36 months
3          C        C1          0              11           RENT   36 months
4          A        A4          0               4           RENT   36 months
5          E        E1          0              10           RENT   36 months

Code:

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(categorical_features='all', handle_unknown='error', n_values='auto', sparse=True)
encoder.fit(lending_club)

The error I'm receiving is on the term column:

ValueError: could not convert string to float: ' 36 months'

Solution

  • OneHotEncoder does not support string features. You have to convert them to integers before, using LabelEncoder for example. Another option would be to use LabelBinarizer on all columns.

    See How to do Onehotencoding in Sklearn Pipeline.