Search code examples
pandasduplicatesunique

fill only unique value from another dataframe based on condition


How to fill '0' value in df1 from unique value from another dataframe (df2). the expected output is no duplicate in df1.

any reference links for this. thank for helping out.

data1 = {'test' :['b',0,'e',0,0,'f']}
df1 = pd.DataFrame(data=data1)

data2 = {'test' :['a','b','c','d','e','f']}
df2 = pd.DataFrame(data=data2)

df1

   test
0   b  
1   0
2   e  
3   0  
4   0  
5   f  

df2

   test
0   a
1   b
2   c
3   d
4   e
5   f

expected output:

   test
0   b  -- df1
1   a  -- fill with a from df2
2   e  -- df1
3   c  -- fill with c from df2
4   d  -- fill with d from df2
5   f  -- df1

Solution

  • Assuming you have enough unique values in df2 to fill the 0s in df1, extract those unique values, and assign them with boolean indexing:

    # which rows are 0?
    m = df1['test'].eq(0)
    
    # extract values of df2 that are not in df1
    vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].tolist()
    # ['b', 'e', 'f']
    
    # assign the values in the limit of the needed number
    df1.loc[m, 'test'] = vals[:m.sum()]
    
    print(df1)
    

    Output:

      test
    0    b
    1    a
    2    e
    3    c
    4    d
    5    f
    

    If there is not always enough values in df2 and you want to fill the first possible 0s:

    m = df1['test'].eq(0)
    
    vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].unique()
    # ['b', 'e', 'f']
    
    m2 = m.cumsum().le(len(vals))
    
    df1.loc[m&m2, 'test'] = vals[:m.sum()]
    
    print(df1)