Search code examples
rsequencecounter

Create sequence counter based on condition


I have a dataset like this,

       x time 
1   TRUE    9       
2   TRUE    8       
3   TRUE   10      
4   TRUE    5       
5   TRUE   16       
6  FALSE    2       
7  FALSE   17       
8  FALSE    6   
9   TRUE   11       
10  TRUE    7       
11  TRUE   20       
12  TRUE    3       
13  TRUE   10       
14 FALSE    4       
15 FALSE    2       
16 FALSE   10  
17  TRUE    3       
18  TRUE    6 

Using r, I would like to generate a new variable to assign a unique number for the changes of condition based on x and time. Specifically, I would like to search the data from the beginning and assign a number (e.g.,1) for the first row. When the condition of x change between TRUE and False, the number will increase. Within the "x is False" condition, the number will remain the same. However, within the "x is TRUE" condition, the number remain the same when "x is TRUE and time < 10", but the number will also be added 1, when meet the condition that "x is TRUE and time >= 10", and remain the same until meeting the next change condition.

Saying differently, change between TRUE and False is considered as condition change. In addition, when "x is TRUE", every time when time > 10 is also considered as the start of a new condition.

The output I would like to get is like this.

       x time   count
1   TRUE   9       1
2   TRUE   8       1
3   TRUE   10      2
4   TRUE   5       2
5   TRUE   16      3
6  FALSE    2      4 
7  FALSE   17      4 
8  FALSE    6      4
9   TRUE   11      5 
10  TRUE    7      5 
11  TRUE   20      6 
12  TRUE    3      6 
13  TRUE    9      6 
14 FALSE    4      7 
15 FALSE    2      7 
16 FALSE   10      7
17  TRUE    3      8 
18  TRUE    6      8
19  TRUE    15     9

I tried rleid(x) but it certainly did not take the change in time variable into account. I will appreciate any advice on how to solve this in r!


Solution

  • Here is one option with rleid - use rleid on the column 'x' and the numeric index created based on the 'time' column

    library(data.table)
    setDT(df1)[, count := rleid(x, replace(x, x, cumsum(time[x] >= 10)))]
    

    -output

            x  time count
        <lgcl> <int> <int>
     1:   TRUE     9     1
     2:   TRUE     8     1
     3:   TRUE    10     2
     4:   TRUE     5     2
     5:   TRUE    16     3
     6:  FALSE     2     4
     7:  FALSE    17     4
     8:  FALSE     6     4
     9:   TRUE    11     5
    10:   TRUE     7     5
    11:   TRUE    20     6
    12:   TRUE     3     6
    13:   TRUE     9     6
    14:  FALSE     4     7
    15:  FALSE     2     7
    16:  FALSE    10     7
    17:   TRUE     3     8
    18:   TRUE     6     8
    19:   TRUE    15     9
    

    Or with dplyr

    library(dplyr)
    df1 %>% 
       mutate(count = rleid(x, replace(x, x, cumsum(time[x] >= 10))))
    

    -output

           x time count
    1   TRUE    9     1
    2   TRUE    8     1
    3   TRUE   10     2
    4   TRUE    5     2
    5   TRUE   16     3
    6  FALSE    2     4
    7  FALSE   17     4
    8  FALSE    6     4
    9   TRUE   11     5
    10  TRUE    7     5
    11  TRUE   20     6
    12  TRUE    3     6
    13  TRUE    9     6
    14 FALSE    4     7
    15 FALSE    2     7
    16 FALSE   10     7
    17  TRUE    3     8
    18  TRUE    6     8
    19  TRUE   15     9
    

    data

    df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
    TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
    7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
    "14", "15", "16", "17", "18", "19"), class = "data.frame")