Search code examples
pythonpandasdataframereplacecontains

Check if pandas column contains text in another dataframe and replace values


I have two df's, one for user names and another for real names. I'd like to know how I can check if I have a real name in my first df using the data of the other, and then replace it. For example:

import pandas as pd
df1 = pd.DataFrame({'userName':['peterKing', 'john', 'joe545', 'mary']})
df2 = pd.DataFrame({'realName':['alice','peter', 'john', 'francis', 'joe', 'carol']})

df1
userName
0   peterKing
1   john
2   joe545
3   mary

df2
realName
0   alice
1   peter
2   john
3   francis
4   joe
5   carol

My code should replace 'peterKing' and 'joe545' since these names appear in my df2. I tried using pd.contains, but I can only verify if a name appears or not. The output should be like this:

userName
0   peter
1   john
2   joe
3   mary

Can someone help me with that? Thanks in advance!


Solution

  • You can use loc[row, colum], here you can see the documentation about loc method. And Series.str.contain method to select the usernames you need to replace with the real names. In my opinion, this solution is clear in terms of readability.

    for real_name in df2['realName'].to_list():
      df1.loc[ df1['userName'].str.contains(real_name), 'userName' ] = real_name
    

    Output:

       userName
    0   peter
    1   john
    2   joe
    3   mary