Search code examples
rdataframeif-statementcase

How to reassign values based on grouping and dates (R)


I have a dataframe that looks like the following:

grp1 <- c('A','A','A','A','A','A',
          'B','B','B','B','B','B',
          'C','C','C','C','C','C')
grp2 <- c('PQ2','PQ2','PQ2','PQ2','PQ2','PQ2',
          'PL2','PL2','PL2','PL2','PL2','PL2',
          'PN2','PN2','PN2','PN2','PN2','PN2')
grp3 <- c('ML2','ML2','ML2','ML2','ML2','ML2',
          'MP2','MP2','MP2','MP2','MP2','MP2',
          'MO2','MO2','MO2','MO2','MO2','MO2')
date <- c('2022-04-04','2022-04-05','2022-04-11','2022-04-12','2022-04-18','2022-04-19',
          '2022-04-04','2022-04-05','2022-04-11','2022-04-12','2022-04-18','2022-04-19',
          '2022-04-04','2022-04-05','2022-04-11','2022-04-12','2022-04-18','2022-04-19')

count <- c(1206, 1432, 333, 292, 1120, 1345, 1100, 1150, 245, 423, 1500, 1400, 1300, 1200, 400, 402, 1100, 1107)

df <- data.frame(cbind(grp1, grp2, grp3, date, count))

For each group, I want to reassign the count values that fall on April 11th to match the values that were seen on April 4th. And the same thing for April 12th values - I want them to be reassigned to the values that were seen on April 5th.

I know I need to use a case statement or ifelse function to make this work, but I am a little stuck on the logic for how to write this out. Is there an easier way or a package that does this in one step?


Solution

  • I am not 100% certain how you define a group, so I am assuming here it is all columns that start with "grp".

    library(dplyr)
    
    df |>
      mutate(date = as.Date(date)) |>
      group_by(pick(starts_with("grp"))) |>
      mutate(count = coalesce(count[match(date - 7, date)], count)) |>
      ungroup()
    

    This works by trying to find a date 7 days prior and taking the corresponding count value, if available.


    Alternatively, you can do:

    df |>
      mutate(date = as.Date(date)) |>
      group_by(pick(starts_with("grp"))) |>
      mutate(count = case_match(date, 
                                as.Date("2022-04-11") ~ count[match("2022-04-04", date)],
                                as.Date("2022-04-12") ~ count[match("2022-04-05", date)],
                                .default = count)) |>
      ungroup()
    

    Output

       grp1  grp2  grp3  date       count
       <chr> <chr> <chr> <date>     <chr>
     1 A     PQ2   ML2   2022-04-04 1206 
     2 A     PQ2   ML2   2022-04-05 1432 
     3 A     PQ2   ML2   2022-04-11 1206 
     4 A     PQ2   ML2   2022-04-12 1432 
     5 A     PQ2   ML2   2022-04-18 333  
     6 A     PQ2   ML2   2022-04-19 292  
     7 B     PL2   MP2   2022-04-04 1100 
     8 B     PL2   MP2   2022-04-05 1150 
     9 B     PL2   MP2   2022-04-11 1100 
    10 B     PL2   MP2   2022-04-12 1150 
    11 B     PL2   MP2   2022-04-18 245  
    12 B     PL2   MP2   2022-04-19 423  
    13 C     PN2   MO2   2022-04-04 1300 
    14 C     PN2   MO2   2022-04-05 1200 
    15 C     PN2   MO2   2022-04-11 1300 
    16 C     PN2   MO2   2022-04-12 1200 
    17 C     PN2   MO2   2022-04-18 400  
    18 C     PN2   MO2   2022-04-19 402