I have customers duplicates with different status because there is a row for each customer subscription/product. I want to generate a new_status
for the customer and for it to be 'canceled', every subscription status must be 'canceled' together.
I used:
df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()
to separate every duplicated in a index to indicate the duplicated values
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| | 0
A |canceled| | 1
B |active | | 0
B |canceled| | 1
Thus, I'd like to use .apply and/or .loc to generate:
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| canceled | 0
A |canceled| canceled | 1
B |active | | 0
B |canceled| | 1
From what I understand, you can try doing:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled cancelled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
EDIT since expected o/p was changed:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
.map({True:'cancelled',False:''}))
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1