Search code examples
rdataframetidyr

How to extend calendar periods to complete a dataframe in R?


The code posted at the bottom does a nice job of filling in a dataframe, using package tidyr, so that all ID's end up with the same number of periods, in the case of period defined as number of months ("Period_1" in the below code). Base dataframe testDF has ID of 1 with 5 periods, and ID of 50 and 60 with only 3 periods each. The tidyr code creates additional periods ("Period_1") for ID of 50 and 60 so they too have 5 Period_1´s. The code copies down the "Bal" and "State" fields so that all ID end up with the same number of Period_1, which is correct.

However, how would I extend the calendar month expression of "Period_2" in the same manner, as illustrated immediately below?

enter image description here

Code:

library(tidyr)

testDF <-
  data.frame(
    ID = as.numeric(c(rep(1,5),rep(50,3),rep(60,3))),
    Period_1 = as.numeric(c(1:5,1:3,1:3)),
    Period_2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-01","2012-02","2012-03"),
    Bal = as.numeric(c(rep(10,5),21:23,36:34)),
    State = c("XX","AA","BB","CC","XX","AA","BB","CC","SS","XX","AA")
  )

testDFextend <-
  testDF %>%
  tidyr::complete(ID, nesting(Period_1)) %>%
  tidyr::fill(Bal, State, .direction = "down")

testDFextend

Edit: rolling from one year to the next

A better OP example would have Period 2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-10","2012-11","2012-12"), providing an example whereby extending Period_2 causes a rollover to the next year. Below I add to the tidyr/dplyr answer below to correctly roll over the year:

library(tidyr)
library(dplyr)

testDF <-
  data.frame(
    ID = as.numeric(c(rep(1,5),rep(50,3),rep(60,3))),
    Period_1 = as.numeric(c(1:5,1:3,1:3)),
    Period_2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-10","2012-11","2012-12"),
    Bal = as.numeric(c(rep(10,5),21:23,36:34)),
    State = c("XX","AA","BB","CC","XX","AA","BB","CC","SS","XX","AA")
  )

testDFextend <-
  testDF %>%
  tidyr::complete(ID, nesting(Period_1)) %>%
  tidyr::fill(Bal, State, .direction = "down")

testDFextend %>% 
  separate(Period_2, into = c("year", "month"), convert = TRUE) %>% 
  fill(year) %>% 
  group_by(ID) %>% 
  mutate(month = sprintf("%02d", zoo::na.spline(month))) %>% 
  unite("Period_2", year, month, sep = "-") %>% 
# Now I add the below lines:
  separate(Period_2, into = c("year", "month"), convert = TRUE) %>% 
  mutate(month = as.integer(sprintf("%02d", zoo::na.spline(month)))) %>%
  mutate(year1 = ifelse(month > 12, year+trunc(month/12), year)) %>%
  mutate(month1 = ifelse(month > 12 & month%%12!= 0, month%%12, month)) %>%
  mutate(month1 = ifelse(month1 < 10, paste0(0,month1),month1)) %>%
  unite("Period_2", year1, month1, sep = "-") %>%
  select("ID","Period_1","Period_2","Bal","State")

Solution

  • I think the nicest way to do this is to make use of the padr package, which is built to pad data.frames where there are missing/incomplete columns.

    This uses grouping and cur_data() to make the correct date sequence in Period_2.

    library(dplyr)
    library(tidyr)
    library(padr)
    
    n_periods <- 5
    
    testDF %>%
      pad_int(end_val = n_periods , by = "Period_1", group = "ID") %>%
      group_by(ID) %>%
      mutate(Period_2 = as.Date(paste0(Period_2, "-01"))) %>%
      mutate(Period_2 = seq(cur_data()$Period_2[1], by = "months", length.out = 
        n_periods) %>% format("%Y-%m")) %>%
      fill(Bal, State) %>%
      ungroup() %>%
      select(ID, Period_1, Period_2, Bal, State)    
    
          ID Period_1 Period_2   Bal State
       <dbl>    <dbl> <chr>    <dbl> <chr>
     1     1        1 2012-06     10 XX   
     2     1        2 2012-07     10 AA   
     3     1        3 2012-08     10 BB   
     4     1        4 2012-09     10 CC   
     5     1        5 2012-10     10 XX   
     6    50        1 2013-06     21 AA   
     7    50        2 2013-07     22 BB   
     8    50        3 2013-08     23 CC   
     9    50        4 2013-09     23 CC   
    10    50        5 2013-10     23 CC   
    11    60        1 2012-01     36 SS   
    12    60        2 2012-02     35 XX   
    13    60        3 2012-03     34 AA   
    14    60        4 2012-04     34 AA   
    15    60        5 2012-05     34 AA 
    

    Note that this will handle cases when the year rolls over to the next year during Period_2.

    Finally, you could adjust n_periods if you needed a different number of periods (or use a function to figure it out automatically, like jay.sf's answer).