I have series of operations on pandas dataframes.
Due to some of the filter condition(s) there could be a possibility that some of the dataframes can be empty.
Below apply function is working when I have data in the input dataframes where as it is not returning the apply logic output on a new column where there is no data in the input dataframes.
Below is the Sample code.
import pandas as pd
import numpy as np
#df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'p': [11, 12, 13]})
#df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [7, 8, 9], 'q': [21, 22, 23]})
df1 = pd.DataFrame({'A': [], 'B': [], 'p': []})
df2 = pd.DataFrame({'A': [], 'B': [], 'q': []})
merged_df = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
.assign(D=lambda x: np.where(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y',
(((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])),'N','NA')
))
print(merged_df)
Above code return me an error that
ValueError: operands could not be broadcast together with shapes (0,) () (3,)
I Tried something below Using Apply
merged_df1 = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
.apply(lambda x: 'Y' if (x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q']) else
'N' if (x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q']) else None
)
print(merged_df1)
merged_df1 did not return me any error either didn't ouput the apply function logic
You code works fine, but you have to use numpy.where
in the inner condition as well:
merged_df = pd.merge(df1, df2, on='A', suffixes=('', '_d'))\
.assign(D=lambda x: np.where(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y',
np.where(((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])),'N','NA')
))
For completeness, you could also use pure pandas:
merged_df = (pd
.merge(df1, df2, on='A', suffixes=('', '_d'))
.assign(D=lambda x: ((x['B'] == 5) & (x['B_d'] == 8) & (x['p'] > x['q'])).map({True: 'N', False: 'NA'})
.mask(((x['B'] == 4) & (x['B_d'] == 7) & (x['p'] < x['q'])), 'Y')
)
)
Output 1:
A B p B_d q D
0 1 4 11 7 21 Y
1 2 5 12 8 22 NA
2 3 6 13 9 23 NA
Output 2:
Empty DataFrame
Columns: [B, p, A, B_d, q, D]
Index: []