I am trying to comparing dataframe df1
with df2
by column cust_id
and town_id, and get all rows pf cust_id for which the town_id has changed. I can use list comprehension to get the list of cust_id
which are in df21
but not in df2
or vice-versa. But how do I use town_id change to find the cust_id
for which town_id has changed and generate the output as a dataframe
?
df1
name cust_id town_id
1 cxa c1001 t001
2 cxb c1002 t001
3 cxc c1003 t001
4 cxd c1004 t002
df2
name cust_id town_id
1 cxa c1001 t002
2 cxb c1002 t001
3 cxd c1004 t001
4 cxe c1005 t001
5 cxf c1006 t001
output
name cust_id townId_initial town_id_latter
1 cxa c1001 t001 t002
2 cxd c1006 t002 t001
If I understand you correctly, you want merge the two dataframes by cust_id
and then find rows where town_id
s are different:
out = df1.merge(df2, on="cust_id", how="inner", suffixes=["_initial", "_latter"])
out = out[out.town_id_initial != out.town_id_latter]
print(
out[["name_initial", "cust_id", "town_id_initial", "town_id_latter"]].rename(
columns={"name_initial": "name"}
)
)
Prints:
name cust_id town_id_initial town_id_latter
0 cxa c1001 t001 t002
2 cxd c1004 t002 t001