Search code examples
pythonmachine-learningencodingdata-sciencecategorical

Convert Categorical features to Numerical


I have a lot of categorical columns and want to convert values in those columns to numerical values so that I will be able to apply ML model.

Now by data looks something like below.

Column 1- Good/bad/poor/not reported column 2- Red/amber/green column 3- 1/2/3 column 4- Yes/No

Now I have already assigned numerical values of 1,2,3,4 to good, bad, poor, not reported in column 1 .

So, now can I give the same numerical values like 1,2,3 to red,green, amber etc in column 2 and in a similar fashion to other columns or will doing that confuse model when I implement it


Solution

  • You can do this for some of the rated columns by using df[colname].map({})or LabelEncoder() . They will change each categorical data to numbers, so there is a weight between them, which means if poor is one and good is 3, as you can see, there is a difference between them. You want the model to know it, but if it's just something like colors, you know there is no preference in colors, and green is no different from blue .so it is better not to use the same method and use get_dummies in pandas.