Search code examples
rconditional-statementsdplyrrle

How to label groups in df based on specific sequence of values in a column


I have a dataframe with the id and value columns indicated below, but want to determine the Status column based on values in the value column, by id groups.

  
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
                serial = rep(1:10,3),
                value = c(rep(1,4), rep(0,3), rep(1,3),
                          rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
                          rep(c(1,0),5)),
                status = c(rep("Fluctuating", 10),
                           rep("Fluctuating", 10),
                           rep("Not fluctuating", 10)))
   id serial value          status
1   1      1     1     Fluctuating
2   1      2     1     Fluctuating
3   1      3     1     Fluctuating
4   1      4     1     Fluctuating
5   1      5     0     Fluctuating
6   1      6     0     Fluctuating
7   1      7     0     Fluctuating
8   1      8     1     Fluctuating
9   1      9     1     Fluctuating
10  1     10     1     Fluctuating
11  2      1     1     Fluctuating
12  2      2     1     Fluctuating
13  2      3     1     Fluctuating
14  2      4     1     Fluctuating
15  2      5     0     Fluctuating
16  2      6    -1     Fluctuating
17  2      7    -1     Fluctuating
18  2      8     1     Fluctuating
19  2      9     1     Fluctuating
20  2     10     1     Fluctuating
21  3      1     1 Not fluctuating
22  3      2     0 Not fluctuating
23  3      3     1 Not fluctuating
24  3      4     0 Not fluctuating
25  3      5     1 Not fluctuating
26  3      6     0 Not fluctuating
27  3      7     1 Not fluctuating
28  3      8     0 Not fluctuating
29  3      9     1 Not fluctuating
30  3     10     0 Not fluctuating

Here, a group is considered to be fluctuating if three or more 1s is followed by 3 or more (0s or -1s), followed by 3 or more 1s again. It would also be considered fluctuating if three or more alternating 0s-1s-0s, -1s-0s-1s, etc.

Wondering what the best way is to assign the status column, preferably using dplyr?

Thanks!


Solution

  • Using rle function and dplyr library

    x %>% 
      mutate(value_new = ifelse(value == -1, 0, value)) %>% 
      group_by(id) %>% 
      mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>% 
      select(-value_new) 
    

    Output

    # A tibble: 30 x 4
    # Groups:   id [3]
          id serial value status         
       <dbl>  <int> <dbl> <chr>          
     1     1      1     1 Fluctuating    
     2     1      2     1 Fluctuating    
     3     1      3     1 Fluctuating    
     4     1      4     1 Fluctuating    
     5     1      5     0 Fluctuating    
     6     1      6     0 Fluctuating    
     7     1      7     0 Fluctuating    
     8     1      8     1 Fluctuating    
     9     1      9     1 Fluctuating    
    10     1     10     1 Fluctuating    
    11     2      1     1 Fluctuating    
    12     2      2     1 Fluctuating    
    13     2      3     1 Fluctuating    
    14     2      4     1 Fluctuating    
    15     2      5     0 Fluctuating    
    16     2      6    -1 Fluctuating    
    17     2      7    -1 Fluctuating    
    18     2      8     1 Fluctuating    
    19     2      9     1 Fluctuating    
    20     2     10     1 Fluctuating    
    21     3      1     1 Not fluctuating
    22     3      2     0 Not fluctuating
    23     3      3     1 Not fluctuating
    24     3      4     0 Not fluctuating
    25     3      5     1 Not fluctuating
    26     3      6     0 Not fluctuating
    27     3      7     1 Not fluctuating
    28     3      8     0 Not fluctuating
    29     3      9     1 Not fluctuating
    30     3     10     0 Not fluctuating