Search code examples
pythonpandasdataframepartial

Pandas map two dataframe based on column name mentioned on the other df and partial match to derive new column


I have two dataframes df1 & df2 as below.

import pandas as pd


data1 = {'Column1': [1, 2, 3],
        'Column2': ['Account', 'Biscut', 'Super'],
        'Column3': ['Funny', 'Super', 'Nice']}

df1 = pd.DataFrame(data1)

data2 = {'ColumnName':['Column2','Column3','Column1'],
     'ifExist':['Acc','Sup',3],
     'TarName':['Account_name','Super_name','Val_3']}
df2 = pd.DataFrame(data2)

I want to add new column TarName to the df1 by partially matching the ifExists value from df2 against the ColumnName that Mentioned in the df2 with df1.

My Expected Ouput is:

Column1    column2     column3  TarName
1          Account     Funny    Account_Name
2          Biscut      Super    Super_name  
3          Super       Nice     Val_3

I have tried below code. This code able to partial map but only to one column. With this approach I Need to create dictionaries as many column mapping I have and need to apply as many.

Is there more dynamic approach ?

df2_Column2_dict = df2[df2['ColumnName']=='Column2'].set_index(['ifExist'])['TarName'].to_dict()
pat = r'({})'.format('|'.join(df2_Column2_dict.keys()))
extracted = df1['Column2'].str.extract(pat, expand=False).dropna()
df1['TarName'] =  extracted.apply(lambda x: df2_Column2_dict[x]).reindex(df2.index)

print(df1)

Solution

  • I hope help you with the code. For each concept/line is the explanation with a comment.

    import pandas as pd
    
    # Create DataFrames
    data1 = {'Column1': [1, 2, 3],
            'Column2': ['Account', 'Biscut', 'Super'],
            'Column3': ['Funny', 'Super', 'Nice']}
    df1 = pd.DataFrame(data1)
    
    data2 = {'ColumnName':['Column2','Column3'],
            'ifExist':['Acc','Sup'],
            'TarName':['Account_name','Super_name']}
    df2 = pd.DataFrame(data2)
    
    # Initialize new column in df1
    df1['TarName'] = ''
    
    # Iterate over each row in df2
    for _, row in df2.iterrows():
        # Column in df1 to search
        column_name = row['ColumnName']  
        
        # Partial value to search for
        if_exist = row['ifExist']        
        
        # Value to assign if there is a match
        tar_name = row['TarName']        
    
        # Find partial matches and assign TarName
        mask = df1[column_name].str.contains(if_exist, case=False, na=False)
        df1.loc[mask, 'TarName'] = tar_name
    
    # Show the result
    print(df1)
    

    Edit: Tell if you need to solve it without loops