I have some customer data over dates and I want to see if for example they choose another product over time. Ideally, i'd like to copy the two columns where the changes occurred into a new column.
So, if I had a table like
period, Customer , product
2020-01, Cust1, 12 TS
2020-02, Cust1, 12 TS
2020-03, Cust1, 14 SLM
2020-01, Cust2, 12 SLM
2020-02, Cust2, 12 TS
2020-03, Cust2, 14 SLM
So cust1 went over time from TS to SLM, whereas Cust2 went from SLM to TS then the opposite. The final column should look like:
period, Customer , product , change
2020-01, Cust1, 12 TS , NAN
2020-02, Cust1, 12 TS , NAN
2020-03, Cust1, 14 SLM, from TS to SLM
2020-01, Cust2, 12 SLM, NAN
2020-02, Cust2, 12 TS, from SLM to TS
2020-03, Cust2, 14 SLM, from TS to SLM
I have look in many solutions avaliable like here, but I couldn't manage to do it the way I wanted.
We can first group
the dataframe by Customer
, then shift
to check if there is a change. After that we can compare and determine the change.
df['prev_product'] = df.groupby(['Customer'])['product'].shift().bfill()
df['change'] = df[['product', 'prev_product']].apply(lambda x: None if(x[0] == x[1]) else f'from {x[1]} to {x[0]}', axis=1)
period Customer n product prev_product change
0 2020-01 Cust1 12 TS TS None
1 2020-02 Cust1 12 TS TS None
2 2020-03 Cust1 14 SLM TS from TS to SLM
3 2020-01 Cust2 12 SLM SLM None
4 2020-02 Cust2 12 TS SLM from SLM to TS
5 2020-03 Cust2 14 SLM TS from TS to SLM
NOTE: df.drop('prev_product',axis=1)
incase not required.