Search code examples
rconditional-statementslogic

flag = 1 if there is a change from 1 to 0 in R


Is there a quick way to assign flag to 1 if for a particular variable there is a change from 1 to 0, so that I get the below. This variable is only allowed to go from from 0 to 1, it is not possible to see 1 to 0 and I would like to flag if that happens.

organisation | time       | variable | flag
A            | 2020-01-01 | 0        | 0
A            | 2020-02-01 | 0        | 0   
A            | 2020-03-01 | 0        | 0   
A            | 2020-04-01 | 1        | 0       
A            | 2020-05-01 | 1        | 0      
A            | 2020-06-01 | 1        | 0     
A            | 2020-07-01 | 1        | 0    
B            | 2020-01-01 | 0        | 1      
B            | 2020-02-01 | 0        | 1     
B            | 2020-03-01 | 0        | 1     
B            | 2020-04-01 | 1        | 1       
B            | 2020-05-01 | 0        | 1      
B            | 2020-06-01 | 1        | 1     
B            | 2020-07-01 | 1        | 1     

Edit:

Extra question: what if variable is 0,1,2 and I now want to flag if there is a change from 2 to 1?

organisation | time       | variable | flag
A            | 2020-01-01 | 0        | 0
A            | 2020-02-01 | 0        | 0   
A            | 2020-03-01 | 0        | 0   
A            | 2020-04-01 | 1        | 0       
A            | 2020-05-01 | 1        | 0      
A            | 2020-06-01 | 2        | 0     
A            | 2020-07-01 | 2        | 0    
B            | 2020-01-01 | 0        | 1      
B            | 2020-02-01 | 0        | 1     
B            | 2020-03-01 | 0        | 1     
B            | 2020-04-01 | 1        | 1       
B            | 2020-05-01 | 2        | 1      
B            | 2020-06-01 | 1        | 1     
B            | 2020-07-01 | 1        | 1     

Solution

  • Thank you for clarifying your problem in your comment and edited question. Here is a solution that I think will work with your actual data:

    library(tidyverse)
    
    df <- data.frame(
      stringsAsFactors = FALSE,
                organisation = c("A","A","A","A",
                                 "A","A","A","B","B","B","B","B","B","B"),
                        time = c("2020-01-01",
                                 "2020-02-01","2020-03-01","2020-04-01","2020-05-01",
                                 "2020-06-01","2020-07-01","2020-01-01",
                                 "2020-02-01","2020-03-01","2020-04-01","2020-05-01",
                                 "2020-06-01","2020-07-01"),
                    variable = c(0L,0L,0L,1L,1L,
                                 1L,1L,0L,0L,0L,1L,0L,1L,1L),
                        flag = c(0L,0L,0L,0L,0L,
                                 0L,0L,1L,1L,1L,1L,1L,1L,1L)
          )
    
    df %>%
      group_by(organisation) %>%
      mutate(flag = +any(lag(variable, default = 0) == 1 & variable == 0))
    #> # A tibble: 14 × 4
    #> # Groups:   organisation [2]
    #>    organisation time       variable  flag
    #>    <chr>        <chr>         <int> <int>
    #>  1 A            2020-01-01        0     0
    #>  2 A            2020-02-01        0     0
    #>  3 A            2020-03-01        0     0
    #>  4 A            2020-04-01        1     0
    #>  5 A            2020-05-01        1     0
    #>  6 A            2020-06-01        1     0
    #>  7 A            2020-07-01        1     0
    #>  8 B            2020-01-01        0     1
    #>  9 B            2020-02-01        0     1
    #> 10 B            2020-03-01        0     1
    #> 11 B            2020-04-01        1     1
    #> 12 B            2020-05-01        0     1
    #> 13 B            2020-06-01        1     1
    #> 14 B            2020-07-01        1     1
    

    Extra question:

    The logic is the same - if the current value for variable equals 1 and the previous value ("lagged value") for variable equals 2 (i.e. a change from 2 to 1), flag equals 1 for that whole group:

    df <- data.frame(
      stringsAsFactors = FALSE,
                organisation = c("A","A","A","A",
                                 "A","A","A","B","B","B","B","B","B","B"),
                        time = c("2020-01-01",
                                 "2020-02-01","2020-03-01","2020-04-01","2020-05-01",
                                 "2020-06-01","2020-07-01","2020-01-01",
                                 "2020-02-01","2020-03-01","2020-04-01","2020-05-01",
                                 "2020-06-01","2020-07-01"),
                    variable = c(0L,0L,0L,1L,1L,
                                 2L,2L,0L,0L,0L,1L,2L,1L,1L),
                        flag = c(0L,0L,0L,0L,0L,
                                 0L,0L,1L,1L,1L,1L,1L,1L,1L)
          )
    
    df %>%
      group_by(organisation) %>%
      mutate(flag = +any(lag(variable, default = 0) == 2 & variable == 1))
    #> # A tibble: 14 × 4
    #> # Groups:   organisation [2]
    #>    organisation time       variable  flag
    #>    <chr>        <chr>         <int> <int>
    #>  1 A            2020-01-01        0     0
    #>  2 A            2020-02-01        0     0
    #>  3 A            2020-03-01        0     0
    #>  4 A            2020-04-01        1     0
    #>  5 A            2020-05-01        1     0
    #>  6 A            2020-06-01        2     0
    #>  7 A            2020-07-01        2     0
    #>  8 B            2020-01-01        0     1
    #>  9 B            2020-02-01        0     1
    #> 10 B            2020-03-01        0     1
    #> 11 B            2020-04-01        1     1
    #> 12 B            2020-05-01        2     1
    #> 13 B            2020-06-01        1     1
    #> 14 B            2020-07-01        1     1
    

    Created on 2022-07-19 by the reprex package (v2.0.1)