Search code examples
python-3.xpandassklearn-pandas

How to deal with data that is not ints in sklearn


I am very new to sklearn and pandas and was wondering how to deal with non int values. I have done examples where the data was just ints and it worked but now i am working with strings and it is not working I have tried astype but it did not help.

the data is in a csv and looks as such:

|value| type|
|a    |    g|
|b    |    g|
|a    |    g|
|d    |    g|
|c    |    k|
|f    |    g|

value is the target but I do not know how to pass it to use this data to do X and Y so i can use something like fit.


Solution

  • You can't work with categorical value (object, string, etc..). It's mandatory to transform your categorical variables into numerical variables with a encoder (sklearn.preprocessing.LabelEncoder) For example a --> 0 / b--> 1 / d--> 2 / c --> 3 and f --> 4 If you want to customize your transformation categorical --> numerical, you should do the transformation manually.