Search code examples
pythonpandasstringdataframelookup

Merging dataframes based on pairs


I have a dataframe that looks like this:

df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'a:g,h:b,j:']
                   })

The datatype of col_2 is a string, so we must do string manipulation/regex.

I also have another dataframe that has a mapping between key-value pair from col_2. It looks like this:

df1 = pd.DataFrame({'col_1': ['a', 'c', '', 'w', 'x', 'a', 'h', 'j','t'],
                    'col_2': ['b', 'd', 'v', '','y', 'g', 'b', '', 'g'],
                    'col_3': ['aw', 'rt', 'er', 'aa', 'ey', 'wk', 'oo', 'ri', 'ty'],
                    'col_4': ['rt', 'yu', 'gq', 'tr', 'ui', 'pi', 'pw', 'pp', 'uu']
                   })

basically a:b translated to aw:rt, which means you can't reach aw and rt without both a and b,

I want to get all the values from col_4 corresponding to the key-value pairs in col_2, so i want my output to be

pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'a:g,h:b,j:'],
                   'col_3': ['rt,yu', 'gq', 'tr,ui','pi,pw,pp' ]
                   })

I am able to extract key, value pair as different columns using

df[['c1', 'c2']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)')

so I can extract all the key-value pairs as columns and then do merge, but it looks like a lengthy route, Any other optimised way?


Solution

  • I would use the basic pandas methods here. Split and explode col_2 to get the individual pairs, create a mapping from pairs to col_4 and just map it to replace the values.

    pairs = df['col_2'].str.split(',').explode()
    mapping = df1['col_4'].set_axis(df1['col_1'] + ':' + df1['col_2'])
    df['col_3'] = pairs.map(mapping).groupby(level=0).agg(','.join)
    

    res