Search code examples
pandasdataframeoptimizationzerossl

Dataframe remove zeros beggining and end


I have a dataframe df like this

| Time       | variable one |
| -----------| -------------|
| 2022-11-09 | 0            |
| 2022-11-10 | 0            |
| 2022-11-11 | 2            |
| 2022-11-12 | 7            |
| 2022-11-13 | 0            |
| 2022-11-14 | 5            |
| 2022-11-15 | 3            |
| 2022-11-16 | 0            |
| 2022-11-17 | 0            |

I need to remove all the zeros before the first non zero element and to remove all the zeros after the last non zero element. Zeros in between non zero elements should remain zero.

I solved with two while loops:

i=0
while df.loc[i,'variable one']==0:
        df.loc[i,'variable one'] = np.nan
        i=i+1
        
i=len(df['variable one'])-1
while df.loc[i,'variable one']==0:
        df.loc[i,'variable one'] = np.nan
        i=i-1

This code works, but when dealing with hundreds of columns and many thousands rows it becomes very slow. I am looking for an optimization, even removing while loops.


Solution

  • You can use a boolean mask for boolean indexing by combining cummax in a forward and reverse order:

    m = df['variable one'].ne(0)
    
    df.loc[~(m.cummax()&m[::-1].cummax()), 'variable one'] = np.nan
    
    # or
    # df['variable one'] = df['variable one'].where(m.cummax()&m[::-1].cummax())
    

    Equivalent with cummin:

    m = df['variable one'].eq(0)
    df.loc[(m.cummin()|m[::-1].cummin()), 'variable one'] = np.nan
    

    Output:

             Time  variable one
    0  2022-11-09           NaN
    1  2022-11-10           NaN
    2  2022-11-11           2.0
    3  2022-11-12           7.0
    4  2022-11-13           0.0
    5  2022-11-14           5.0
    6  2022-11-15           3.0
    7  2022-11-16           NaN
    8  2022-11-16           NaN
    

    Intermediates:

             Time  variable one      m  cummax  rev_cummax      &      ~
    0  2022-11-09             0  False   False        True  False   True
    1  2022-11-10             0  False   False        True  False   True
    2  2022-11-11             2   True    True        True   True  False
    3  2022-11-12             7   True    True        True   True  False
    4  2022-11-13             0  False    True        True   True  False
    5  2022-11-14             5   True    True        True   True  False
    6  2022-11-15             3   True    True        True   True  False
    7  2022-11-16             0  False    True       False  False   True
    8  2022-11-16             0  False    True       False  False   True