Search code examples
pythonpandasdatasetdata-preprocessing

How to replace a column with strings into a column with integers?


I Am trying to predict a dataset, which has a column with different strings. For example, there are 3 brands, 'A', 'B', and 'C', and i want to replace them with numbers (0, 1 and 2, for example).

I know how to do that if there were only 2 brands, using pd.eq,

I have tried to use set, but i'd like to know if there is an easier method to do that, since i will have to replace it with columns that have more than 5 differente strings, and it would be pretty annoying.


Solution

  • You can replace them by selecting the records that match those condition, assuming you have your data in df and the column of interest is 'Brand':

    replacement = { 'A': 0, 'B': 1, 'C': 2 }
    for key, value in replacement.items():
        df.loc[df['Brand'] == key, 'Brand'] = value