I'm working on an assignment and we are using OneHotEncoder in scikit-learn to make all categories print out. Here is the a sample of the data and the code I used to transform it:
grade sub_grade short_emp emp_length_num home_ownership term
0 B B2 0 11 RENT 36 months
1 C C4 1 1 RENT 60 months
2 C C5 0 11 RENT 36 months
3 C C1 0 11 RENT 36 months
4 A A4 0 4 RENT 36 months
5 E E1 0 10 RENT 36 months
Code:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(categorical_features='all', handle_unknown='error', n_values='auto', sparse=True)
encoder.fit(lending_club)
The error I'm receiving is on the term column:
ValueError: could not convert string to float: ' 36 months'
OneHotEncoder
does not support string features. You have to convert them to integers before, using LabelEncoder
for example. Another option would be to use LabelBinarizer
on all columns.