Search code examples
pandasdataframereplacemergeapply

pandas column partial substring replace by another dataframe column if substring contains the string


I am having replace issue while I try to replace a string with a string from another dataframe.

Below is my data :

import pandas as pd
data={"ID": ["zx125", "zx137", "zx897", "zx567", "zx694"],"Long_string": ["TM~AA~Loc1::MH~CC~Loc1::LH~BF~Loc2::CM~CD~Loc2::CH~CD~Loc2", "MH~BGG~Loc1::LH~BF~Loc2::CM~RR~Loc2", "TM~VD~Loc1::MH~EC~Loc1::LH~BF~Loc2", "MH~BF~Loc1::LH~BF~Loc2::CH~CD~Loc2","DF~VD~Loc3::MH~EC~Loc1::LH~BF~Loc2::CH~CD~Loc2"]}
df1 = pd.DataFrame(data)

enter image description here

mapping = {"Before Mapping": ["MH~CC~Loc1", "LH~BF~Loc2", "CH~CD~Loc2"],"After Mapping": ["MH~RT~Loc1","LH~BE~Loc2","CH~CE~Loc2"]}
df2 = pd.DataFrame(mapping)

enter image description here

how can I get a New_string in original dataframe.

After mapping with the join and replace by df2. the expected output dataframe.

output={"ID": ["zx125", "zx137", "zx897", "zx567", "zx694"],"Long_string": ["TM~AA~Loc1::MH~CC~Loc1::LH~BF~Loc2::CM~CD~Loc2::CH~CD~Loc2", "MH~BGG~Loc1::LH~BF~Loc2::CM~RR~Loc2", "TM~VD~Loc1::MH~EC~Loc1::LH~BF~Loc2", "MH~BF~Loc1::LH~BF~Loc2::CH~CD~Loc2","DF~VD~Loc3::MH~EC~Loc1::LH~BF~Loc2::CH~CD~Loc2"],"New_string": ["TM~AA~Loc1::MH~RT~Loc1::LH~BE~Loc2::CM~CD~Loc2::CH~CE~Loc2", "MH~BGG~Loc1::LH~BE~Loc2::CM~RR~Loc2", "TM~VD~Loc1::MH~EC~Loc1::LH~BE~Loc2", "MH~BF~Loc1::LH~BE~Loc2::CH~CE~Loc2","DF~VD~Loc3::MH~EC~Loc1::LH~BE~Loc2::CH~CE~Loc2"]}
desire_output = pd.DataFrame(output)

enter image description here

I tried to use replace and apply. but i can't get the desired output.

Thanks for your help!


Solution

  • You can get your desired result by splitting the Long_string values on ::, replacing according to df2 and then grouping on the index and joining the values back together again:

    df1['New_string'] = (df1['Long_string']
        .str
        .split('::')
        .explode()
        .replace(df2.set_index('Before Mapping')['After Mapping'])
        .groupby(level=0)
        .agg('::'.join)
    )
    

    Output:

          ID                                                 Long_string                                                  New_string
    0  zx125  TM~AA~Loc1::MH~CC~Loc1::LH~BF~Loc2::CM~CD~Loc2::CH~CD~Loc2  TM~AA~Loc1::MH~RT~Loc1::LH~BE~Loc2::CM~CD~Loc2::CH~CE~Loc2
    1  zx137                         MH~BGG~Loc1::LH~BF~Loc2::CM~RR~Loc2                         MH~BGG~Loc1::LH~BE~Loc2::CM~RR~Loc2
    2  zx897                          TM~VD~Loc1::MH~EC~Loc1::LH~BF~Loc2                          TM~VD~Loc1::MH~EC~Loc1::LH~BE~Loc2
    3  zx567                          MH~BF~Loc1::LH~BF~Loc2::CH~CD~Loc2                          MH~BF~Loc1::LH~BE~Loc2::CH~CE~Loc2
    4  zx694              DF~VD~Loc3::MH~EC~Loc1::LH~BF~Loc2::CH~CD~Loc2              DF~VD~Loc3::MH~EC~Loc1::LH~BE~Loc2::CH~CE~Loc2