Search code examples
pythonpandaspandas-groupbymulti-index

Pandas remove all 0s until first nonzero value within each level of multiindex


I'm dealing with a pandas MultiIndex dataset where a lot of the different groups of data (Item 1, Item 2, etc.) have 0s. I only want to delete rows with 0 where 0s are at the beginning of the Item group. If 0s come after a non-zero number, I don't want to delete it. How can I remove all zeroes until the first nonzero value within each group (outside index)?

Original data

                          Quantity
Item1           2016-05-08  0.0
                2016-05-15  0.0
                2016-05-22  0.0
                2016-05-29  456
                2016-06-05   22
Item2           2018-03-08  0.0
                2018-03-15  300
                2018-03-21  0.0
                2018-03-29  0.0
                2018-03-05  433

Desired output

                          Quantity
Item1           2016-05-29  456
                2016-06-05   22
Item2           2018-03-15  300
                2018-03-21  0.0
                2018-03-29  0.0
                2018-03-05  433

Solution

  • Essentially an island-and-gap problem. Every time you hit a non-zero value in the group, you create a new island. Then your job becomes removing those islands 0. The code:

    islands = (df['Quantity'] != 0).groupby(level=0).cumsum()
    df[islands != 0]
    

    A slow motion of what happened:

                        Quantity   Quantity != 0?   cumsum
    Item1   2016-05-08       0.0           False         0
            2016-05-15       0.0           False         0
            2016-05-22       0.0           False         0
            2016-05-29       456            True         1
            2016-06-05        22            True         2
    ------------------------------------------------------
    Item2   2018-03-08       0.0           False         0
            2018-03-15       300            True         1
            2018-03-21       0.0           False         1
            2018-03-29       0.0           False         1
            2018-03-05       433            True         2
    

    And in the end, you remove the rows with cumsum == 0