Search code examples
pythonpandasdataframeconditional-statementscounter

Python Pandas Dataframe: conditional counters of one and zeros values


I want to add two columns in my dataframe with the following function using python:

  1. "counter1": it counts the number of the ones in the column "case1-0". (cumulative sum of the ones)
  2. "counter2": it counts the number of the zeros in the column "case1-0" but only if the last previous value in "counter1" is greater than 3.
case1-0 counter1 counter2
1 1 0
1 2 0
1 3 0
1 4 0
0 0 1
0 0 2
0 0 3
1 1 0
1 2 0
0 0 0
0 0 0
1 1 0

In the reality I have a time serie. Therefore the approach should be applicable for greater data frames.


Solution

  • Use this solution with a mask for compare by 1 for counter11 and for counter22 is replaced not 1 values of column counter11 to missing values and forward filling them, so possible compare for greater values like 3 and pass to numpy.where values of s helper Series:

    a = df['case1-0'].eq(1)
    b = a.cumsum()
    df['counter11'] = b.sub(b.mask(a).ffill().fillna(0)).astype(int)
    
    a1 = ~a
    b = a1.cumsum()
    s = b.sub(b.mask(a1).ffill().fillna(0)).astype(int)
    
    df['counter22'] = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)
    print (df)
        case1-0  counter1  counter2  counter11  counter22
    0         1         1         0          1          0
    1         1         2         0          2          0
    2         1         3         0          3          0
    3         1         4         0          4          0
    4         0         0         1          0          1
    5         0         0         2          0          2
    6         0         0         3          0          3
    7         1         1         0          1          0
    8         1         2         0          2          0
    9         0         0         0          0          0
    10        0         0         0          0          0
    11        1         1         0          1          0
    

    Explanation how it working:

    print (df.assign(counter0 = s,
                     replaced=df['counter11'].where(a),
                     ffill= df['counter11'].where(a).ffill(),
                     mask = df['counter11'].where(a).ffill().gt(3),
                     chained = df['counter11'].where(a).ffill().gt(3) & a1,
                     counter22 = np.where(df['counter11'].where(a).ffill().gt(3) & a1, s, 0)))
    

        case1-0  counter1  counter2  counter11  counter0  replaced  ffill   mask  \
    0         1         1         0          1         0       1.0    1.0  False   
    1         1         2         0          2         0       2.0    2.0  False   
    2         1         3         0          3         0       3.0    3.0  False   
    3         1         4         0          4         0       4.0    4.0   True   
    4         0         0         1          0         1       NaN    4.0   True   
    5         0         0         2          0         2       NaN    4.0   True   
    6         0         0         3          0         3       NaN    4.0   True   
    7         1         1         0          1         0       1.0    1.0  False   
    8         1         2         0          2         0       2.0    2.0  False   
    9         0         0         0          0         1       NaN    2.0  False   
    10        0         0         0          0         2       NaN    2.0  False   
    11        1         1         0          1         0       1.0    1.0  False   
    
        chained  counter22  
    0     False          0  
    1     False          0  
    2     False          0  
    3     False          0  
    4      True          1  
    5      True          2  
    6      True          3  
    7     False          0  
    8     False          0  
    9     False          0  
    10    False          0  
    11    False          0  
    

    Alternative solution

    counter is for count consecutive 0 and 1 values by compare shifted values for groups with GroupBy.cumcount, then for counter11 set 0 if not match condition mask in Series.where and for counter22 is used same solution like above (only changed variables):

    mask = df['case1-0'].eq(1)
    counter = df.groupby(df['case1-0'].ne(df['case1-0'].shift()).cumsum()).cumcount().add(1)
    df['counter11'] = counter.where(mask, 0)
    df['counter22'] = np.where(df['counter11'].where(mask).ffill().gt(3) & ~mask, counter, 0)
    
    print (df)
        case1-0  counter1  counter2  counter11  counter22
    0         1         1         0          1          0
    1         1         2         0          2          0
    2         1         3         0          3          0
    3         1         4         0          4          0
    4         0         0         1          0          1
    5         0         0         2          0          2
    6         0         0         3          0          3
    7         1         1         0          1          0
    8         1         2         0          2          0
    9         0         0         0          0          0
    10        0         0         0          0          0
    11        1         1         0          1          0