I have a data frame which is a panel: I have different CLPs (it's like the ID of an event) repeated several times, one for each phase of the event: organization, set-up, start, end, clean-up. Each phase has a starting date and final date. I would like to create a dummy equal 1 for the most recent final date of each CLP, and if it is missing, but the starting date of the following phase is not missing and more recent than 30/04/2024, then I would like to have dummy =1 for it. Date is POSIXct format.
I provide you an example of the result i'm looking for:
| CLP | Phase | Starting date | Final Date | Dummy|
| -------- | -------- | ------------- | ---------- | ---- |
| A | 1 | 01/01/2019 | 29/01/2019 | 0 |
| A | 2 | 03/04/2020 | 04/12/2022 | 1 |
| B | 2 | 01/05/2023 | NA | 0 |
| B | 3 | 03/04/2024 | NA | 1 |
Thanks a lot
I grouped by CLP but then I am not able to code the date problem.
I think we need a bigger sample to proper validate this, but it works as expected with the provided data. Check it out:
library(tidyverse)
# Toy data
my_df <- structure(list(
clp = c("A", "A", "B", "B"),
phase = c(1, 2, 2, 3),
start_date = structure(c(17897, 18355, 19478, 19816), class = "Date"),
end_date = structure(c(17925, 19330, NA, NA), class = "Date"),
dummy = c(0, 1, 0, 1)),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
my_date <- dmy("30/04/2024")
#
new_df <- my_df %>%
mutate(
.by = clp,
my_dummy = case_when(
any(!is.na(end_date)) & end_date == max(end_date, na.rm = TRUE) ~ 1,
all(is.na(end_date)) & !is.na(lag(start_date)) & start_date < my_date ~ 1,
.default = 0))
Output:
> new_df
# A tibble: 4 × 6
clp phase start_date end_date dummy my_dummy
<chr> <dbl> <date> <date> <dbl> <dbl>
1 A 1 2019-01-01 2019-01-29 0 0
2 A 2 2020-04-03 2022-12-04 1 1
3 B 2 2023-05-01 NA 0 0
4 B 3 2024-04-03 NA 1 1
Created on 2024-05-29 with reprex v2.1.0