Search code examples
rdataframefill

Fill rows of a data frame by group in R


I would like to transform this data frame:

id date X
001 2022-04-12 dsf
001 2022-04-13 df
001 2022-04-14 sdw
002 2022-04-12 we3
002 2022-04-13 r45
002 2022-04-14 2s
003 2022-04-12 de65
003 2022-04-13 hjk
004 2022-04-12 vcbdf
005 2022-04-12 342f

to this one:

id date X
001 2022-04-12 dsf
001 2022-04-13 df
001 2022-04-14 sdw
002 2022-04-12 we3
002 2022-04-13 r45
002 2022-04-14 2s
003 2022-04-12 de65
003 2022-04-13 hjk
003 2022-04-14 NA
004 2022-04-12 vcbdf
004 2022-04-13 NA
004 2022-04-14 NA
005 2022-04-12 342f
005 2022-04-13 NA
005 2022-04-14 NA

As you can see, the date target is from 2022-04-12 to 2022-04-14, so the rows without all the dates should be filled.

Any idea?


Solution

  • You can use tidyr::complete to create a sequence of dates for each id from the minimum date value to the maximum.

    library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(date = as.Date(date)) %>%
      complete(id, date = seq(min(date), max(date), 'days'))
    
    # A tibble: 15 × 3
    #       id date       X    
    #   <int> <date>     <chr>
    # 1     1 2022-04-12 dsf  
    # 2     1 2022-04-13 df   
    # 3     1 2022-04-14 sdw  
    # 4     2 2022-04-12 we3  
    # 5     2 2022-04-13 r45  
    # 6     2 2022-04-14 2s   
    # 7     3 2022-04-12 de65 
    # 8     3 2022-04-13 hjk  
    # 9     3 2022-04-14 NA   
    #10     4 2022-04-12 vcbdf
    #11     4 2022-04-13 NA   
    #12     4 2022-04-14 NA   
    #13     5 2022-04-12 342f 
    #14     5 2022-04-13 NA   
    #15     5 2022-04-14 NA