Search code examples
pythonpandaslist-comprehension

Find the rows in two dataframe where a 2nd column's value has changed for a 1st column which is a key column?


I am trying to comparing dataframe df1 with df2 by column cust_id and town_id, and get all rows pf cust_id for which the town_id has changed. I can use list comprehension to get the list of cust_id which are in df21 but not in df2 or vice-versa. But how do I use town_id change to find the cust_id for which town_id has changed and generate the output as a dataframe?

df1
  name   cust_id town_id
1 cxa    c1001    t001
2 cxb    c1002    t001 
3 cxc    c1003    t001
4 cxd    c1004    t002

df2
  name   cust_id  town_id
1 cxa    c1001    t002
2 cxb    c1002    t001 
3 cxd    c1004    t001
4 cxe    c1005    t001
5 cxf    c1006    t001

output
  name    cust_id townId_initial  town_id_latter
1 cxa    c1001    t001              t002
2 cxd    c1006    t002              t001

Solution

  • If I understand you correctly, you want merge the two dataframes by cust_id and then find rows where town_ids are different:

    out = df1.merge(df2, on="cust_id", how="inner", suffixes=["_initial", "_latter"])
    out = out[out.town_id_initial != out.town_id_latter]
    
    print(
        out[["name_initial", "cust_id", "town_id_initial", "town_id_latter"]].rename(
            columns={"name_initial": "name"}
        )
    )
    

    Prints:

      name cust_id town_id_initial town_id_latter
    0  cxa   c1001            t001           t002
    2  cxd   c1004            t002           t001