Is there a quick way to assign flag
to 1 if for a particular variable
there is a change from 1 to 0, so that I get the below. This variable
is only allowed to go from from 0 to 1, it is not possible to see 1 to 0 and I would like to flag if that happens.
organisation | time | variable | flag
A | 2020-01-01 | 0 | 0
A | 2020-02-01 | 0 | 0
A | 2020-03-01 | 0 | 0
A | 2020-04-01 | 1 | 0
A | 2020-05-01 | 1 | 0
A | 2020-06-01 | 1 | 0
A | 2020-07-01 | 1 | 0
B | 2020-01-01 | 0 | 1
B | 2020-02-01 | 0 | 1
B | 2020-03-01 | 0 | 1
B | 2020-04-01 | 1 | 1
B | 2020-05-01 | 0 | 1
B | 2020-06-01 | 1 | 1
B | 2020-07-01 | 1 | 1
Edit:
Extra question: what if variable is 0,1,2 and I now want to flag if there is a change from 2 to 1?
organisation | time | variable | flag
A | 2020-01-01 | 0 | 0
A | 2020-02-01 | 0 | 0
A | 2020-03-01 | 0 | 0
A | 2020-04-01 | 1 | 0
A | 2020-05-01 | 1 | 0
A | 2020-06-01 | 2 | 0
A | 2020-07-01 | 2 | 0
B | 2020-01-01 | 0 | 1
B | 2020-02-01 | 0 | 1
B | 2020-03-01 | 0 | 1
B | 2020-04-01 | 1 | 1
B | 2020-05-01 | 2 | 1
B | 2020-06-01 | 1 | 1
B | 2020-07-01 | 1 | 1
Thank you for clarifying your problem in your comment and edited question. Here is a solution that I think will work with your actual data:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
organisation = c("A","A","A","A",
"A","A","A","B","B","B","B","B","B","B"),
time = c("2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01","2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01"),
variable = c(0L,0L,0L,1L,1L,
1L,1L,0L,0L,0L,1L,0L,1L,1L),
flag = c(0L,0L,0L,0L,0L,
0L,0L,1L,1L,1L,1L,1L,1L,1L)
)
df %>%
group_by(organisation) %>%
mutate(flag = +any(lag(variable, default = 0) == 1 & variable == 0))
#> # A tibble: 14 × 4
#> # Groups: organisation [2]
#> organisation time variable flag
#> <chr> <chr> <int> <int>
#> 1 A 2020-01-01 0 0
#> 2 A 2020-02-01 0 0
#> 3 A 2020-03-01 0 0
#> 4 A 2020-04-01 1 0
#> 5 A 2020-05-01 1 0
#> 6 A 2020-06-01 1 0
#> 7 A 2020-07-01 1 0
#> 8 B 2020-01-01 0 1
#> 9 B 2020-02-01 0 1
#> 10 B 2020-03-01 0 1
#> 11 B 2020-04-01 1 1
#> 12 B 2020-05-01 0 1
#> 13 B 2020-06-01 1 1
#> 14 B 2020-07-01 1 1
Extra question:
The logic is the same - if the current value for variable equals 1 and the previous value ("lagged value") for variable equals 2 (i.e. a change from 2 to 1), flag equals 1 for that whole group:
df <- data.frame(
stringsAsFactors = FALSE,
organisation = c("A","A","A","A",
"A","A","A","B","B","B","B","B","B","B"),
time = c("2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01","2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01"),
variable = c(0L,0L,0L,1L,1L,
2L,2L,0L,0L,0L,1L,2L,1L,1L),
flag = c(0L,0L,0L,0L,0L,
0L,0L,1L,1L,1L,1L,1L,1L,1L)
)
df %>%
group_by(organisation) %>%
mutate(flag = +any(lag(variable, default = 0) == 2 & variable == 1))
#> # A tibble: 14 × 4
#> # Groups: organisation [2]
#> organisation time variable flag
#> <chr> <chr> <int> <int>
#> 1 A 2020-01-01 0 0
#> 2 A 2020-02-01 0 0
#> 3 A 2020-03-01 0 0
#> 4 A 2020-04-01 1 0
#> 5 A 2020-05-01 1 0
#> 6 A 2020-06-01 2 0
#> 7 A 2020-07-01 2 0
#> 8 B 2020-01-01 0 1
#> 9 B 2020-02-01 0 1
#> 10 B 2020-03-01 0 1
#> 11 B 2020-04-01 1 1
#> 12 B 2020-05-01 2 1
#> 13 B 2020-06-01 1 1
#> 14 B 2020-07-01 1 1
Created on 2022-07-19 by the reprex package (v2.0.1)