Search code examples
rdatetime-serieslubridatepadr

Padding around dates in R to add missing/blank months?


The padr R pacakge vignette describes different package functions to pad dates and times around said dates and times.

I am in situations where I'll be tallying events in data frames (ie dplyr::count()) and will need to plot occurrences, over a period of say... 1 year. When I count the events in a low volume data frame I'll often get single line item results, like this:

library(tidyverse)
library(lubridate)
library(padr)
df <- tibble(col1 = as.Date("2018-10-01"), col2 = "g", col3 = 5)

#> # A tibble: 1 x 3
#>   col1       col2   col3
#>   <date>     <chr> <dbl>
#> 1 2018-10-01 g         5

To plot this with ggplot, over a period of a year, on a monthly basis, requires a data frame of 12 rows. It basically needs to look like this:

#> # A tibble: 12 x 3
#>   col1       col2   col3
#>   <date>     <chr> <dbl>
#>  1 2018-01-01 NA        0
#>  2 2018-02-01 NA        0
#>  3 2018-03-01 NA        0
#>  4 2018-04-01 NA        0
#>  5 2018-05-01 NA        0
#>  6 2018-06-01 NA        0
#>  7 2018-07-01 NA        0
#>  8 2018-08-01 NA        0
#>  9 2018-09-01 NA        0
#> 10 2018-10-01 g         5
#> 11 2018-11-01 NA        0
#> 12 2018-12-01 NA        0

Perhaps padr() can do this with some combination of the thicken() and pad() functions. My attempts are shown below, neither line 3 nor line 4 construct the data frame shown directly above.

How do I construct that data frame direclty above, utilizing padr(), lubridate(), tidyverse(), data.table(), base R, or any way you please? Manual entry of each month shall not be considered either, if that needs to be said. Thank you.

df %>% 
  thicken("year") %>% 
  # pad(by = "col1") %>%       # line 3
  # pad(by = "col1_year") %>%  # line 4
  print()

Solution

  • library(lubridate)
    library(tidyverse)
    
    df <- tibble(col1 = as.Date("2018-10-01"), col2 = "g", col3 = 5)
    
    my_year <- year(df$col1[1])
    
    df2 <- tibble(col1 = seq(ymd(paste0(my_year,'-01-01')),ymd(paste0(my_year,'-12-01')), by = '1 month'))
    
    df3 <- merge(df,df2, by ="col1",all.y=TRUE) %>% mutate(col3 = replace_na(col3,0))
    
    df3