I have a data set where I need to group the duplicate records and then see if any of the group has been flagged; if so, I need to indicate that it has been flagged in a new column.
Sample input; the "Concat" column is where I'm looking for duplicates:
Customer_Reference Auth_ID Concat Flag
C001 A1234 AAA TRUE
C002 A758 AAA FALSE
C003 A6149 BBB FALSE
C004 A5271 CCC FALSE
C005 A0124 BBB FALSE
The behaviour I'm looking for is that "AAA" is a duplicate where at least one of them has the Flag of "TRUE"; therefore, it should be marked as "TRUE" in the (to be added) column.
I've been able to mark the duplicate records, and the groupby function seems to chunk it up correctly, but I can't work out how to then apply an "If-Then" to each group to assign a value to the new column
An efficient approach would be to drop_duplicates
to only keep the rows with multiple occurrence of "Concat", then select those that have a True "Flag" with isin
.
keep = df.drop_duplicates('Concat').query('Flag')['Concat']
# ['AAA']
df['New'] = df['Concat'].isin(keep)
As a one-liner:
df['New'] = df['Concat'].isin(df.drop_duplicates('Concat')
.query('Flag')['Concat'])
Output:
Customer_Reference Auth_ID Concat Flag New
0 C001 A1234 AAA True True
1 C002 A758 AAA False True
2 C003 A6149 BBB False False
3 C004 A5271 CCC False False
4 C005 A0124 BBB False False
Intermediates:
# df.drop_duplicates('Concat')
Customer_Reference Auth_ID Concat Flag New
0 C001 A1234 AAA True True
2 C003 A6149 BBB False False
3 C004 A5271 CCC False False
# df.drop_duplicates('Concat').query('Flag')
Customer_Reference Auth_ID Concat Flag New
0 C001 A1234 AAA True True
# df.drop_duplicates('Concat').query('Flag')['Concat']
0 AAA
Name: Concat, dtype: object