I'm dealing with a pandas MultiIndex dataset where a lot of the different groups of data (Item 1, Item 2, etc.) have 0s. I only want to delete rows with 0 where 0s are at the beginning of the Item group. If 0s come after a non-zero number, I don't want to delete it. How can I remove all zeroes until the first nonzero value within each group (outside index)?
Original data
Quantity
Item1 2016-05-08 0.0
2016-05-15 0.0
2016-05-22 0.0
2016-05-29 456
2016-06-05 22
Item2 2018-03-08 0.0
2018-03-15 300
2018-03-21 0.0
2018-03-29 0.0
2018-03-05 433
Desired output
Quantity
Item1 2016-05-29 456
2016-06-05 22
Item2 2018-03-15 300
2018-03-21 0.0
2018-03-29 0.0
2018-03-05 433
Essentially an island-and-gap problem. Every time you hit a non-zero value in the group, you create a new island. Then your job becomes removing those islands 0. The code:
islands = (df['Quantity'] != 0).groupby(level=0).cumsum()
df[islands != 0]
A slow motion of what happened:
Quantity Quantity != 0? cumsum
Item1 2016-05-08 0.0 False 0
2016-05-15 0.0 False 0
2016-05-22 0.0 False 0
2016-05-29 456 True 1
2016-06-05 22 True 2
------------------------------------------------------
Item2 2018-03-08 0.0 False 0
2018-03-15 300 True 1
2018-03-21 0.0 False 1
2018-03-29 0.0 False 1
2018-03-05 433 True 2
And in the end, you remove the rows with cumsum == 0