'Vlookup' and return 'Match or Not Match' based on corresponding rows using Python

I have 3 columns, Name, Date and Quantity for Table A, and Table B. And I add Column D, which is the result of the concatenated of Column Name, Date and Quantity. I would like to vlookup the Column D in Table A with Table B. If there is a match, the output would be 'Yes' at Column E; if no match, then 'No'.

If the output is 'No' in Column E, given Column D (concatenated), I would like to identify if Column Name, Date or/and Quantity is/are the reason of not match. For example, if Name is not match, return output in Column F as 'Not match name', otherwise return output as 'Match name'.

The problem I encountered right now is the output (match or no match) that I found for Name is correct, but not for Date and Quantity. I think it is mainly due to one-to-many, many-to-many relationship, in which there are multiple repetitive within Name, Date and Quantity.

My code is not consistent because I amended them from time to time due to the incorrect output especially for Date and Quantity.

This is what I tried so far:

#Concate the 3 columns

df2_A = df1_A.copy()
df2_A.loc[:, 'A_Concate'] = df2_A['Name'].astype(str) + df2_of01['Date'].astype(str) + df2_A['Quantity'].astype(str)

df2_B = df1_B.copy()
df2_B.loc[:, 'B_Concate'] = df2_B['Name'].astype(str) + df2_Name['Date'].astype(str) + df2_B['Quantity'].astype(int).astype(str)

#Vlookup concatenated column for Table A and B
df2_A ['Match with B?'] = df2_A ['A_Concate'].isin(df2_B['B_Concate']).map({True: 'Yes', False: 'No'})

#Find reason of not match
df2_A ['Match name?'] = df2_A .apply(lambda row: 'Not match name' if row['Match with B?'] == 'No' and row['name'] not in df2_B['Name'].unique() else 'Match name', axis=1)

df2_A ['Match date?'] = df2_A .apply(lambda row: 'Match date' if row['Match with B?'] == 'Yes' else ('Not match date' if row['Date'] not in df2_B.loc[df2_B['B_Concate'] == row['A_Concate'], 'Date'].values else 'Match date'), axis=1)

df2_A ['Match quantity?'] = df2_A .apply(lambda row: 'Not match quantity' if row['Match with B?'] == 'No' and row['Match part?'] == 'Not match part' else ('Not match quantity' if row['Match with B?'] == 'No' and row['SUGGESTED QTY'] not in df2_B['Quantity'].unique() else 'Match quantity'), axis=1)

Which part can be improved so that the output can be returned based on the concatenated rows?

Solution

IIUC, you can use merge this way :

out = (pd.merge(df2_A, df2_B, on=list(df2_A.columns), how="left", indicator="Match with B?")
           .replace({"Match with B?": {"both": "Yes", "left_only": "No"}}))

out["Why ?"] = (pd.concat([pd.merge(df2_A[[col]].drop_duplicates(), df2_B[[col]].drop_duplicates(),
                                    on=col, how="left", indicator=f"check_{i}")
                           for i, col in enumerate(df2_A.columns)], axis=1).filter(like="check")
                   .set_axis(df2_A.columns, axis=1).replace({"both": True, "left_only": False})
                   .apply(lambda x: np.where(x.eq(False), x.name, None)).stack().groupby(level=0).agg(list)
      )

Output :

print(out)

  Name        Date  Quantity Match with B?                   Why ?
0  foo  2023-02-11         1            No        [Date, Quantity]
1  bar  2023-03-22         2           Yes                     NaN
2  baz  2023-01-05         3            No  [Name, Date, Quantity]
3  qux  2023-04-18         4            No            [Name, Date]
4  bar  2023-05-01         5            No                  [Date]

If you want to expose the match check of each column, use this :

tmp = (pd.merge(df2_A, df2_B, on=list(df2_A.columns), how="left", indicator="Match with B?")
           .replace({"Match with B?": {"both": "Yes", "left_only": "No"}}))

out = tmp.join(pd.concat([pd.merge(df2_A[[col]].drop_duplicates(), df2_B[[col]].drop_duplicates(),
                                    on=col, how="left", indicator=f"check_{i}")
                           for i, col in enumerate(df2_A.columns)], axis=1).filter(like="check")
                   .set_axis(df2_A.columns, axis=1).replace({"both": True, "left_only": False})
                   .add_prefix("Match ").add_suffix(" ?").replace({True: "Yes", False: "No"}).fillna("Yes")
      )

Output :

print(out)

  Name        Date  Quantity Match with B? Match Name ? Match Date ? Match Quantity ?
0  foo  2023-02-11         1            No          Yes           No               No
1  bar  2023-03-22         2           Yes          Yes          Yes              Yes
2  baz  2023-01-05         3            No           No           No               No
3  qux  2023-04-18         4            No           No           No              Yes
4  bar  2023-05-01         5            No          Yes           No              Yes

With highlights to see the results :

Inputs used :

df2_A = pd.DataFrame({
    "Name": ["foo", "bar", "baz", "qux", "bar"],
    "Date": ["2023-02-11", "2023-03-22", "2023-01-05", "2023-04-18", "2023-05-01"],
    "Quantity": [1, 2, 3, 4, 5]
})

df2_B = pd.DataFrame({
    "Name": ["foo", "xyz", "foo", "bar"],
    "Date": ["2023-02-30", "2023-02-25", "2023-03-10", "2023-03-22"],
    "Quantity": [5, 4, 6, 2]
})