Search code examples
pythonpandasdataframerunning-count

Add column with a specific sequence of numbers depending on value


I have this dataframe:

df = pd.DataFrame({
        'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        'Condition': [False, False, True, False, False, False, False, False, False, False, True, False]})

     ID   Condition
0     1       False
1     1       False
2     1        True
3     1       False
4     1       False
5     1       False
6     1       False
7     1       False
8     1       False
9     1       False
10    1        True
11    1       False

I want to add a new column Sequence with a sequence of numbers. The condition is when the first True appears in the Condition column, the following rows must contain the sequence 1, 2, 3, 1, 2, 3... until another True appears again, at which point the sequence is restarted again. Furthermore, ideally, until the first True appears, the values in the new column should be 0. El resultado final sería:

     ID   Condition  Sequence
0     1       False         0
1     1       False         0
2     1        True         1
3     1       False         2
4     1       False         3
5     1       False         1
6     1       False         2
7     1       False         3
8     1       False         1
9     1       False         2
10    1        True         1
11    1       False         2

I have tried to do it with cumsum and cumcount but I can't find the exact code.

Any suggestion?


Solution

  • Let us do cumsum to identify blocks of rows, then group the dataframe by blocks and use cumcount to create sequential counter, then with some simple maths we can get the output

    b = df['Condition'].cumsum()
    df['Seq'] = df.groupby(b).cumcount().mod(3).add(1).mask(b < 1, 0)
    

    Explained

    Identify blocks/groups of rows using cumsum

    b = df['Condition'].cumsum()
    print(b)
    
    0     0
    1     0
    2     1 # -- group 1 start --
    3     1
    4     1
    5     1
    6     1
    7     1
    8     1
    9     1 # -- group 1 ended --
    10    2
    11    2
    Name: Condition, dtype: int32
    

    Group the dataframe by the blocks and use cumcount to create a sequential counter per block

    c = df.groupby(b).cumcount()
    print(c)
    
    0     0
    1     1
    2     0
    3     1
    4     2
    5     3
    6     4
    7     5
    8     6
    9     7
    10    0
    11    1
    dtype: int64
    

    Modulo(%) divide the sequential counter by 3 to create a repeating sequence that repeats every three rows

    c = c.mod(3).add(1)
    print(c)
    
    0     1
    1     2
    2     1
    3     2
    4     3
    5     1
    6     2
    7     3
    8     1
    9     2
    10    1
    11    2
    dtype: int64
    

    Mask the values in sequence with 0 where the group(b) is < 1

    c = c.mask(b < 1, 0)
    print(c)
    
    0     0
    1     0
    2     1
    3     2
    4     3
    5     1
    6     2
    7     3
    8     1
    9     2
    10    1
    11    2
    

    Result

        ID  Condition  Seq
    0    1      False    0
    1    1      False    0
    2    1       True    1
    3    1      False    2
    4    1      False    3
    5    1      False    1
    6    1      False    2
    7    1      False    3
    8    1      False    1
    9    1      False    2
    10   1       True    1
    11   1      False    2