Search code examples
rdplyrtidyverselubridate

Adding rows (with dates) for observations that do not exist in the dataframe


I have a simple question for data in the format below (date is dd/m/yy). Code for data input is below the question.

#>      date  a  b
#> 1 25/1/20 10 20
#> 2 26/1/20 20 40

How can I add rows for all of January, with 0 values for a and b?

My desired output is:

#>      date  a  b
#> 1 01/1/20  0  0
#> 2 02/1/20  0  0
#> ...
#> ...
#>   25/1/20 10 20
#>   26/1/20 20 40

Tidyverse answers are preferred, but I'm happy with any solution.

Code for data input:

df <- data.frame(
  stringsAsFactors = FALSE,
              date = c("25/1/20", "26/1/20"),
                 a = c(10L, 20L),
                 b = c(20L, 40L)
)

Solution

  • You could use a new dataframe and then join. The key is to create a new dataframe based on the max date on your df and define a sequence since the first day. Here the code:

    library(dplyr)
    library(lubridate)
    #Data
    df <- data.frame(
      stringsAsFactors = FALSE,
      date = c("25/1/20", "26/1/20"),
      a = c(10L, 20L),
      b = c(20L, 40L)
    )
    #Code 1
    df %>% mutate(date=dmy(date)) -> df
    #Identify max date
    maxdate <- max(df$date)
    #Set initial day
    startday <- as.Date(paste0(format(maxdate,'%Y-%m'),'-01'))
    #New data frame
    ndf <- data.frame(date=seq(startday,maxdate,by=1))
    #Join
    ndf %>% left_join(df) %>% replace(is.na(.),0)
    

    Output:

             date  a  b
    1  2020-01-01  0  0
    2  2020-01-02  0  0
    3  2020-01-03  0  0
    4  2020-01-04  0  0
    5  2020-01-05  0  0
    6  2020-01-06  0  0
    7  2020-01-07  0  0
    8  2020-01-08  0  0
    9  2020-01-09  0  0
    10 2020-01-10  0  0
    11 2020-01-11  0  0
    12 2020-01-12  0  0
    13 2020-01-13  0  0
    14 2020-01-14  0  0
    15 2020-01-15  0  0
    16 2020-01-16  0  0
    17 2020-01-17  0  0
    18 2020-01-18  0  0
    19 2020-01-19  0  0
    20 2020-01-20  0  0
    21 2020-01-21  0  0
    22 2020-01-22  0  0
    23 2020-01-23  0  0
    24 2020-01-24  0  0
    25 2020-01-25 10 20
    26 2020-01-26 20 40