Search code examples
rdataframefill

Fill NA until certain date based on different column per group


I have the following dataframe df (dput below):

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07    NA
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06    NA
7      B 2022-01-04 2022-01-06    NA
8      B 2022-01-06 2022-01-06    NA
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

I would like to fill the NA values per group until date2. So all NA values after the first row with a value (group A with value 1) should be filled until date2 if date1 is less. This is the desired output:

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07     1
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06     3
7      B 2022-01-04 2022-01-06     3
8      B 2022-01-06 2022-01-06     3
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

As you can see in group A only the next row is filled with 1 because the last date1 is after date2. So I was wondering if anyone knows how to fill the NA based on a certain date per group?


dput df:

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), date1 = c("2022-01-01", "2022-01-02", "2022-01-04", 
"2022-01-10", "2022-01-01", "2022-01-03", "2022-01-04", "2022-01-06", 
"2022-01-01", "2022-01-03", "2022-01-04", "2022-01-11"), date2 = c("2022-01-07", 
"2022-01-07", "2022-01-07", "2022-01-07", "2022-01-06", "2022-01-06", 
"2022-01-06", "2022-01-06", "2022-01-09", "2022-01-09", "2022-01-09", 
"2022-01-09"), value = c(NA, 1, NA, NA, 3, NA, NA, NA, NA, NA, 
2, NA)), class = "data.frame", row.names = c(NA, -12L))

Solution

  • Create an additional group and use fill

    library(dplyr)
    library(tidyr)
    df %>% 
      group_by(group, after = date1 > date2) %>% 
      fill(value) %>% 
      ungroup() %>% 
      select(-after)
    
    # A tibble: 12 × 4
       group date1      date2      value
       <chr> <chr>      <chr>      <dbl>
     1 A     2022-01-01 2022-01-07    NA
     2 A     2022-01-02 2022-01-07     1
     3 A     2022-01-04 2022-01-07     1
     4 A     2022-01-10 2022-01-07    NA
     5 B     2022-01-01 2022-01-06     3
     6 B     2022-01-03 2022-01-06     3
     7 B     2022-01-04 2022-01-06     3
     8 B     2022-01-06 2022-01-06     3
     9 C     2022-01-01 2022-01-09    NA
    10 C     2022-01-03 2022-01-09    NA
    11 C     2022-01-04 2022-01-09     2
    12 C     2022-01-11 2022-01-09    NA