Search code examples
stringdataframereplaceduplicatesunique

How to Replace Multiple String in a Data frame Using Python


I have a data frame with 73k rows, and here's the following sample data :

Index    Customers' Name   States
0        Alpha             Oregon
1        Alpha             Oregon
2        Bravo             Utah
3        Bravo             Utah
4        Charlie           Alabama
5        Charlie           Alabama
6        Alpha             Oregon
7        Alpha             Oregon
8        Bravo             Utah

The data have a unique value but I am not allowed to delete or remove it because it's needed or mandatory for my research. On the other hand, I would like to change the customers' names with some specific pseudocode so the result can look like this :

Index    Customers' Name   States
0        z1                Oregon
1        z1                Oregon
2        z2                Utah
3        z2                Utah
4        z3                Alabama
5        z3                Alabama
6        z1                Oregon
7        z1                Oregon
8        z2                Utah 

I'm still a beginner, learning Python for around 3 months. So, how can I change this in a 'bulky' way remembering that I have 73k rows like this? I assume that it must be executed using a looping ('For'). I already tried, but I can't wrap up this well. Please help me finish/solve this.


Solution

  • You can use .groupby() with .ngroup():

    df["Customers' Name"] = "z" + (
        df.groupby("Customers' Name").ngroup() + 1
    ).astype("str")
    
    print(df)
    

    Prints:

      Customers' Name   States
    0              z1   Oregon
    1              z1   Oregon
    2              z2     Utah
    3              z2     Utah
    4              z3  Alabama
    5              z3  Alabama
    6              z1   Oregon
    7              z1   Oregon
    8              z2     Utah