Search code examples
rdataframedate

Select most recent date in a panel RStudio


I have a data frame which is a panel: I have different CLPs (it's like the ID of an event) repeated several times, one for each phase of the event: organization, set-up, start, end, clean-up. Each phase has a starting date and final date. I would like to create a dummy equal 1 for the most recent final date of each CLP, and if it is missing, but the starting date of the following phase is not missing and more recent than 30/04/2024, then I would like to have dummy =1 for it. Date is POSIXct format.

I provide you an example of the result i'm looking for:

| CLP       | Phase   | Starting date | Final Date | Dummy|

| -------- | -------- | ------------- | ---------- | ---- |

| A        | 1        | 01/01/2019    | 29/01/2019 | 0 |

| A        | 2        | 03/04/2020    | 04/12/2022 | 1 |

| B       | 2        | 01/05/2023    |        NA   | 0 |

| B       | 3        | 03/04/2024    |        NA   | 1 |

Thanks a lot

I grouped by CLP but then I am not able to code the date problem.


Solution

  • I think we need a bigger sample to proper validate this, but it works as expected with the provided data. Check it out:

    library(tidyverse)
    
    # Toy data
    my_df <- structure(list(
      clp        = c("A", "A", "B", "B"), 
      phase      = c(1, 2, 2, 3), 
      start_date = structure(c(17897, 18355, 19478, 19816), class = "Date"), 
      end_date   = structure(c(17925, 19330, NA, NA),       class = "Date"), 
      dummy      = c(0, 1, 0, 1)),
      
      row.names  = c(NA, -4L), 
      class      = c("tbl_df", "tbl", "data.frame"))
    
    my_date <- dmy("30/04/2024")
    
    #
    new_df <- my_df %>%
      mutate(
        .by = clp, 
        my_dummy = case_when(
          any(!is.na(end_date)) & end_date == max(end_date, na.rm = TRUE)        ~ 1,
          all(is.na(end_date))  & !is.na(lag(start_date)) & start_date < my_date ~ 1,
          .default = 0))
    

    Output:

    > new_df
    # A tibble: 4 × 6
      clp   phase start_date end_date   dummy my_dummy
      <chr> <dbl> <date>     <date>     <dbl>    <dbl>
    1 A         1 2019-01-01 2019-01-29     0        0
    2 A         2 2020-04-03 2022-12-04     1        1
    3 B         2 2023-05-01 NA             0        0
    4 B         3 2024-04-03 NA             1        1
    

    Created on 2024-05-29 with reprex v2.1.0