Search code examples
pythonpandasrowvlookup

Look up in other rows for missing matches in a row (Pandas)


I have a dataframe that looks similar to this:

df1 = pd.DataFrame({'product': ['widget1', 'widget1', 'widget1', 'widget2', 'widget2', 'widget3', 'widget3', 'widget4'],
               'group': ['group1', 'group1', '', 'group1', '','group2', '', '']})

and want to output this:

df2 = pd.DataFrame({'product': ['widget1', 'widget1', 'widget1', 'widget2', 'widget2', 'widget3', 'widget3', 'widget4'],
               'group': ['group1', 'group1', '', 'group1', '','group2', '', ''],
               'new group': ['group1', 'group1', 'group1', 'group1', 'group1','group2', 'group2', '']})

So that a new column is created that:

  • populates over values from the 2nd column if not nan
  • for nan, a look up is made against the corresponding value in the first column and any match for that first column value in any other row where that value has a match with column 2 and, if so, pull in that value into the new column
  • if none found, leave as null

Any help on this would be appreciated! Thanks!


Solution

  • Try this set of codes:

    import numpy as np
    
    df1['group'][df1['group'] == ''] = np.nan          
    df1['new group'] = df1.groupby('product')['group'].ffill().bfill()   
    df1[['group', 'new group']] = df1[['group', 'new group']].fillna('')
    

    Output result:

       product   group new group
    0  widget1  group1    group1
    1  widget1  group1    group1
    2  widget1            group1
    3  widget2  group1    group1
    4  widget2            group1
    5  widget3  group2    group2
    6  widget3            group2
    7  widget4                  
    

    The codes work as follows:

    1st line: Temporarily set empty string in column group to nan

    2nd line: Group the rows by the same product groups. Then for each group with same product, look into the column group and perform both forward fill and backward fill to fillup any nan values with non-nan values anywhere in both directions within the group. The resulting Series of data (with null values now filled-up as far as possible) is copied to the new column new group.

    3rd line: reset the empty strings previously translated to nan back to empty string. If other columns in your original dataframe does not contain nan values, you can simplify this line of code to df1.fillna('')

    Note that these codes assumed for every product, there can only be one possible value of group value (if any). If there can be 2 or above, the forward fill and backward fill operations may get different values depending on the positions of the multiple non-null group values.