I have DataFrame like this:
sale_id dt receipts_qty
31 196.0 2017-02-19 95.0
32 203.0 2017-02-20 101.0
33 196.0 2017-02-21 105.0
34 196.0 2017-02-22 112.0
35 196.0 2017-02-23 118.0
36 196.0 2017-02-24 135.0
37 196.0 2017-02-25 135.0
38 196.0 2017-02-26 124.0
40 203.0 2017-02-27 290.0
39 196.0 2017-02-27 84.0
42 203.0 2017-02-28 330.0
41 196.0 2017-02-28 124.0
43 196.0 2017-03-01 100.0
44 203.0 2017-03-01 361.0
I have to drop duplicates by dt
and keep the rows where sale_id == 196
. I have found only drop_duplicates('dt', keep='last')
and drop_duplicates('dt', keep='first')
but it isn't what I need.
DataFrame I want to get:
sale_id dt receipts_qty
31 196.0 2017-02-19 95.0
32 203.0 2017-02-20 101.0
33 196.0 2017-02-21 105.0
34 196.0 2017-02-22 112.0
35 196.0 2017-02-23 118.0
36 196.0 2017-02-24 135.0
37 196.0 2017-02-25 135.0
38 196.0 2017-02-26 124.0
39 196.0 2017-02-27 84.0
41 196.0 2017-02-28 124.0
43 196.0 2017-03-01 100.0
Create helper column first for first value by condition, then sort_values
and drop_duplicates
.
Last cleaning - remove column a
and sort_index
:
print (df)
sale_id dt receipts_qty
31 196.0 2017-02-19 95.0
32 203.0 2017-02-20 101.0
33 196.0 2017-02-21 105.0
34 196.0 2017-02-22 112.0
35 196.0 2017-02-23 118.0
36 196.0 2017-02-24 135.0
37 196.0 2017-02-25 135.0
38 196.0 2017-02-26 124.0
40 203.0 2017-02-27 290.0
39 196.0 2017-02-27 84.0
42 103.0 2017-02-28 330.0 <-changed data, value < 196
41 196.0 2017-02-28 124.0
43 196.0 2017-03-01 100.0
44 203.0 2017-03-01 361.0
#get only values > 196
df['a'] = (df.sale_id == 196).astype(int)
#sorting by new column, remove duplicates, remove helper column
df['a'] = (df.sale_id == 196).astype(int)
df = (df.sort_values(['a','dt'], ascending=[False, True])
.drop_duplicates('dt')
.drop('a', axis=1)
.sort_index())
print (df)
sale_id dt receipts_qty
31 196.0 2017-02-19 95.0
32 203.0 2017-02-20 101.0
33 196.0 2017-02-21 105.0
34 196.0 2017-02-22 112.0
35 196.0 2017-02-23 118.0
36 196.0 2017-02-24 135.0
37 196.0 2017-02-25 135.0
38 196.0 2017-02-26 124.0
39 196.0 2017-02-27 84.0
41 196.0 2017-02-28 124.0
43 196.0 2017-03-01 100.0