Search code examples
pythonpandasapplypandas-loc

Pandas Apply with condition


I have customers duplicates with different status because there is a row for each customer subscription/product. I want to generate a new_status for the customer and for it to be 'canceled', every subscription status must be 'canceled' together.

I used:

df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()

to separate every duplicated in a index to indicate the duplicated values

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled|            | 0
 A       |canceled|            | 1
 B       |active  |            | 0
 B       |canceled|            | 1

Thus, I'd like to use .apply and/or .loc to generate:

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled| canceled   | 0
 A       |canceled| canceled   | 1
 B       |active  |            | 0
 B       |canceled|            | 1

Solution

  • From what I understand, you can try doing:

    df['new_status']=(df.groupby('Customer')['Status'].
      transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
    print(df)
    
        Customer    Status new_status  duplicated
    0   X         canceled             0         
    1   X         canceled             1         
    2   X         active               2         
    3   Y         canceled  cancelled  0         
    4   A         canceled  cancelled  0         
    5   A         canceled  cancelled  1         
    6   B         active               0         
    7   B         canceled             1   
    

    EDIT since expected o/p was changed:

    df['new_status']=(df.groupby('Customer')['Status'].
                 transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
                             .map({True:'cancelled',False:''}))
    print(df)
    
      Customer    Status new_status  duplicated
    0   X         canceled             0         
    1   X         canceled             1         
    2   X         active               2         
    3   Y         canceled             0         
    4   A         canceled  cancelled  0         
    5   A         canceled  cancelled  1         
    6   B         active               0         
    7   B         canceled             1