My inventory data contain columns: sale_date
, saleID
, region
, product
. Each product
in each region
can be sold multiple times within that region
, but each time has a unique saleID
.
Now, I have a list of saleID
, I need to select the rows in which the saleID
is located. Moreover, I need to find the last saleID
(and the whole row) of this product in this region. In other word, I need to group by region
and product
first, and find the saleID
in the list and find the previous row of this saleID
in the grouped rows
df
sale_date saleID region product
2017-05-01 A12 Asia car1
2018-03-05 B21 Asia car1
2018-04-01 C23 Asia car1
2018-09-06 C21 Canada car2
2019-01-01 E11 Canada car2
2019-03-02 E23 USA car3
2019-01-02 G41 USA car3
2015-02-02 H11 Mexico car4
2015-02-03 I14 Mexico car4
Expected output
saleID_list = [B21, E11, I14]
sale_date saleID region product
2017-05-01 A12 Asia car1
2018-03-05 B21 Asia car1
2018-09-06 C21 Canada car2
2019-01-01 E11 Canada car2
2015-02-02 H11 Mexico car4
2015-02-03 I14 Mexico car4
I know we can select the rows with the listed saleID
using
df.loc[df['saleID'].isin(saleID_list)]
But can I find the previous row of the groupby data? I attempted to write something like using shift
, but AttributeError: 'DataFrameGroupBy' object has no attribute 'loc'
df.groupby(['region', 'product']).loc[df['saleID'].isin(saleID_list)].shift(1)
Use DataFrameGroupBy.shift
for helper Series
and test it in Series.isin
with original mask for test column saleID
chained by |
for bitwise OR
:
saleID_list = ["B21", "E11", "I14"]
s = df.groupby(['region', 'product'])['saleID'].shift(-1)
df = df[df['saleID'].isin(saleID_list) | s.isin(saleID_list)]
print (df)
sale_date saleID region product
0 2017-05-01 A12 Asia car1
1 2018-03-05 B21 Asia car1
3 2018-09-06 C21 Canada car2
4 2019-01-01 E11 Canada car2
7 2015-02-02 H11 Mexico car4
8 2015-02-03 I14 Mexico car4