Search code examples
pythonpandasdataframelogic

A str value changed over time & Customer dimension(s) in Pandas


I have some customer data over dates and I want to see if for example they choose another product over time. Ideally, i'd like to copy the two columns where the changes occurred into a new column.

So, if I had a table like

period, Customer , product
2020-01, Cust1, 12 TS
2020-02, Cust1, 12 TS
2020-03, Cust1, 14 SLM
2020-01, Cust2, 12 SLM
2020-02, Cust2, 12 TS
2020-03, Cust2, 14 SLM

So cust1 went over time from TS to SLM, whereas Cust2 went from SLM to TS then the opposite. The final column should look like:

period, Customer , product , change
2020-01, Cust1, 12 TS , NAN
2020-02, Cust1, 12 TS , NAN
2020-03, Cust1, 14 SLM, from TS to SLM
2020-01, Cust2, 12 SLM, NAN
2020-02, Cust2, 12 TS, from SLM to TS
2020-03, Cust2, 14 SLM, from TS to SLM

I have look in many solutions avaliable like here, but I couldn't manage to do it the way I wanted.


Solution

  • We can first group the dataframe by Customer, then shift to check if there is a change. After that we can compare and determine the change.

    df['prev_product'] = df.groupby(['Customer'])['product'].shift().bfill()
    
    df['change'] = df[['product', 'prev_product']].apply(lambda x: None if(x[0] == x[1]) else f'from {x[1]}  to {x[0]}', axis=1)
    
        period  Customer    n   product prev_product    change
    0   2020-01 Cust1       12  TS      TS              None
    1   2020-02 Cust1       12  TS      TS              None
    2   2020-03 Cust1       14  SLM     TS              from TS to SLM
    3   2020-01 Cust2       12  SLM     SLM             None
    4   2020-02 Cust2       12  TS      SLM             from SLM to TS
    5   2020-03 Cust2       14  SLM     TS              from TS to SLM
    ​
    

    NOTE: df.drop('prev_product',axis=1) incase not required.