Search code examples
pandasmergeapplyassign

Pandas merge and Apply or Assign function not working


I have series of operations on pandas dataframes.

Due to some of the filter condition(s) there could be a possibility that some of the dataframes can be empty.

Below apply function is working when I have data in the input dataframes where as it is not returning the apply logic output on a new column where there is no data in the input dataframes.

Below is the Sample code.


import pandas as pd
import numpy as np

#df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'p': [11, 12, 13]})
#df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [7, 8, 9], 'q': [21, 22, 23]})

df1 = pd.DataFrame({'A': [], 'B': [], 'p': []})
df2 = pd.DataFrame({'A': [], 'B': [], 'q': []})


merged_df = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
                .assign(D=lambda x: np.where(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y', 
                                             (((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])),'N','NA')
                                             ))

print(merged_df)

Above code return me an error that ValueError: operands could not be broadcast together with shapes (0,) () (3,)

I Tried something below Using Apply

merged_df1 = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
                .apply(lambda x: 'Y' if (x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q']) else
                       'N' if (x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q']) else None
                       )
                
print(merged_df1)

merged_df1 did not return me any error either didn't ouput the apply function logic


Solution

  • You code works fine, but you have to use numpy.where in the inner condition as well:

    merged_df = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
                    .assign(D=lambda x: np.where(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y', 
                                                 np.where(((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])),'N','NA')
                                                 ))
    

    For completeness, you could also use pure pandas:

    merged_df = (pd
       .merge(df1, df2, on='A', suffixes=('', '_d'))
       .assign(D=lambda x: ((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])).map({True: 'N', False: 'NA'})
                           .mask(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y')
               )
    )
    

    Output 1:

       A  B   p  B_d   q   D
    0  1  4  11    7  21   Y
    1  2  5  12    8  22  NA
    2  3  6  13    9  23  NA
    

    Output 2:

    Empty DataFrame
    Columns: [B, p, A, B_d, q, D]
    Index: []