Search code examples
rdplyrtidyverse

How to set values as NA with complex if-else criteria in R?


My df looks like this:

SERIAL  quest  time_d1_1  time_d1_2  time_d2_1  time_d2_2  STARTED              V01
F3L     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:06:19  5
F3L     d1_2   05:00      17:30      05:15      17:45      2022-01-08 17:30:07  2
F3L     d2_1   05:00      17:30      05:15      17:45      2022-01-08 8:36:54   1
F3L     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:10:07  7
7HG     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:33:15  4
7HG     d1_2   05:00      17:30      05:15      17:45      2022-01-08 18:49:22  2
7HG     d2_1   05:00      17:30      05:15      17:45      2022-01-08 07:33:15  2
7HG     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:29:22  6
  1. SERIAL = Identifier
  2. quest = [day X] _ [measurement]; "d1_1" = day one, measurement one
  3. time_d1_1 = the reference time (hh:mm) for day one, measurement 1
  4. ...
  5. time_d2_2 = the reference time (hh:mm) for day two, measurement 2
  6. STARTED = the day & time (yyyy:mm:dd hh:mm:ss) when each measurement was started
  7. V01 = some values belonging to each quest

For each row, I would like to set the variable V01 to NA, when the variable STARTED is more than an hour later than the reference time (time_d1_1 to time_d2_2), in relation to the quest variable.

Example: In row 3, SERIAL=F3L started day two, measurement one (quest=d2_1) at 8:36:54. However, the reference time (time_d2_1) is 05:15. I would now set V01=1 to NA, as 8:36:54 is larger than 05:15 + hour(1).

Unfortunately, I have this weird format so I struggle to solve this with the mutate-ifelse() or mutate-case_when() functions. Can anyone help, preferably a tidyverse solution?

The data:

Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
                      quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
                      time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
                      time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
                      time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
                      time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
                      STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
                                  "2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
                                  "2021-09-05 07:33:15","2021-09-05 18:29:22"),
                      V01 = c(5,3,1,7,4,2,2,6)),
                 class = "data.frame",
                 row.names = c(NA, -8L))

Solution

  • We can turn the table into long form with tidyr::pivot_longer, do the calculation, and turn into wide form (with tidyr::pivot_wider) again. See inline comments in example. I used lubridate to parse the datetime object; this could probably also be done using base R.

    library(dplyr)
    library(tidyr)
    library(lubridate)
    
    Dat %>% 
      pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
      filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest` 
      mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
      pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
      group_by(SERIAL) %>%
      mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above
    

    Which gives

      SERIAL quest STARTED               V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
      <chr>  <chr> <chr>               <dbl> <lgl>   <chr>     <chr>     <chr>     <chr>    
    1 F3L    d1_1  2022-01-08 05:06:19     5 TRUE    05:00     17:30     05:15     17:45    
    2 F3L    d1_2  2022-01-08 17:30:07     3 TRUE    05:00     17:30     05:15     17:45    
    3 F3L    d2_1  2022-01-09 8:36:54      1 FALSE   05:00     17:30     05:15     17:45    
    4 F3L    d2_2  2022-01-09 18:10:07     7 TRUE    05:00     17:30     05:15     17:45    
    5 7HG    d1_1  2021-09-04 05:33:15     4 TRUE    05:30     18:10     05:30     18:00    
    6 7HG    d1_2  2021-09-04 18:49:22     2 TRUE    05:30     18:10     05:30     18:00    
    7 7HG    d2_1  2021-09-05 07:33:15     2 FALSE   05:30     18:10     05:30     18:00    
    8 7HG    d2_2  2021-09-05 18:29:22     6 TRUE    05:30     18:10     05:30     18:00