I have dataFrame and I need to drop duplicates per group ('col1') based on a minimum value in another column 'abs(col1 - col2)', but I need to change this condition for the last group by taking the max value in 'abs(col1 - col2)' that corresponding to the last group in 'col1' where I sorted the 'col1' with ascending order. (to behave as a loop)
I need to assign the last group dynamically.
for example, If I have a data frame as
df = pd.DataFrame( {'col0':['A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],'col1':[1,1,1,2,2,2,3,3,3,4,4,4,2,2,2,3,3,3,4,4,4,5,5,5], 'col2':[2,3,4,1,3,4,1,2,4,1,2,3,3,4,5,2,4,5,2,3,5,2,3,4]})
df['abs(col1 - col2)']=abs(df['col1']-df['col2'])
my trial:
df.sort_values(by=['col0','col1','abs(col1 - col2)','col2'],ascending=[True,True,True,False]).drop_duplicates(['col0','col1'])
the resulting as follow:
If I understand correctly, you want each a different sort order each time you reach a maximum on col1
.
df.groupby(['col0'], as_index=False)['col1'].max()
df
as you diddf
:pd.merge(df, col1_max_groups)
DataFrame
with the new valuesFull example:
col1_max_groups = df.groupby(['col0'], as_index=False)['col1'].max()
deduped = df.sort_values(['col0', 'col1', 'abs(col1 - col2)', 'col2'],
ascending=[True, True, True, False]) \
.drop_duplicates(['col0', 'col1']) \
.set_index(['col0', 'col1'])
update = pd.merge(df, col1_max_groups) \
.sort_values(['col0', 'col1', 'abs(col1 - col2)', 'col2'],
ascending=[True, True, False, False]) \
.drop_duplicates(['col0', 'col1'])
deduped.update(update.set_index(['col0', 'col1']))
deduped.reset_index()
# returns
# col0 col1 col2 abs(col1 - col2)
# A 1 2 1
# A 2 3 1
# A 3 4 1
# A 4 1 3
# B 2 3 1
# B 3 4 1
# B 4 5 1
# B 5 2 3