Search code examples
rdata-analysis

Expand data set row in R


I've got a table like this:

| Activation Month | Disabled Month | Month.Fee | Custr
| 21/4/2018        | N/A            | 10        |     A    
| 21/3/2018        | 21/6/2018      | 20        |     B

I want to transform this table base on validity range, in column Activation and Disabled, in order to have 1 entry per months, (assuming today is 30/11/2018) like:

Month | Enrolled  |  Activation Month | Disabled Month | Month.Fee | Cust.
  1   |     N     | 21/4/2018         | N/A            | 10        | A
  2   |     N     | 21/4/2018         | N/A            | 10        | A
  3   |     N     | 21/4/2018         | N/A            | 10        | A
  4   |     Y     | 21/4/2018         | N/A            | 10        | A
  5   |     Y     | 21/4/2018         | N/A            | 10        | A
  6   |     Y     | 21/4/2018         | N/A            | 10        | A
  7   |     Y     | 21/4/2018         | N/A            | 10        | A
  8   |     Y     | 21/4/2018         | N/A            | 10        | A
  9   |     Y     | 21/4/2018         | N/A            | 10        | A
  10  |     Y     | 21/4/2018         | N/A            | 10        | A
  11  |     Y     | 21/4/2018         | N/A            | 10        | A
  12  |     Y     | 21/4/2018         | N/A            | 10        | A
  1   |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  2   |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  3   |     Y     | 21/3/2018         | 21/6/2018      | 10        | B
  4   |     Y     | 21/3/2018         | 21/6/2018      | 10        | B
  5   |     Y     | 21/3/2018         | 21/6/2018      | 10        | B
  6   |     Y     | 21/3/2018         | 21/6/2018      | 10        | B
  7   |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  8   |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  9   |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  10  |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  11  |     N     | 21/3/2018         | 21/6/2018      | 10        | B
  12  |     N     | 21/3/2018         | 21/6/2018      | 10        | B

Is there any way to manage this without loops?


Solution

  • One option would be

    library(tidyverse)
    df1 %>% 
      mutate(Enrolled = map2(Activation.Month, Disabled.Month, ~ {
                x1 <- month(dmy(.x))
                x2 <- month(dmy(.y))
                ind <- if(is.na(x2)) x1:12 else x1:x2
                1:12 %in% ind})) %>%
      unnest %>%
      mutate(Month = rep(1:12, length.out = n()))
    #    Activation.Month Disabled.Month Month.Fee Custr Enrolled Month
    #1         21/4/2018           <NA>        10     A    FALSE     1
    #2         21/4/2018           <NA>        10     A    FALSE     2
    #3         21/4/2018           <NA>        10     A    FALSE     3
    #4         21/4/2018           <NA>        10     A     TRUE     4
    #5         21/4/2018           <NA>        10     A     TRUE     5
    #6         21/4/2018           <NA>        10     A     TRUE     6
    #7         21/4/2018           <NA>        10     A     TRUE     7
    #8         21/4/2018           <NA>        10     A     TRUE     8
    #9         21/4/2018           <NA>        10     A     TRUE     9
    #10        21/4/2018           <NA>        10     A     TRUE    10
    #11        21/4/2018           <NA>        10     A     TRUE    11
    #12        21/4/2018           <NA>        10     A     TRUE    12
    #13        21/3/2018      21/6/2018        20     B    FALSE     1
    #14        21/3/2018      21/6/2018        20     B    FALSE     2
    #15        21/3/2018      21/6/2018        20     B     TRUE     3
    #16        21/3/2018      21/6/2018        20     B     TRUE     4
    #17        21/3/2018      21/6/2018        20     B     TRUE     5
    #18        21/3/2018      21/6/2018        20     B     TRUE     6
    #19        21/3/2018      21/6/2018        20     B    FALSE     7
    #20        21/3/2018      21/6/2018        20     B    FALSE     8
    #21        21/3/2018      21/6/2018        20     B    FALSE     9
    #22        21/3/2018      21/6/2018        20     B    FALSE    10
    #23        21/3/2018      21/6/2018        20     B    FALSE    11
    #24        21/3/2018      21/6/2018        20     B    FALSE    12
    

    NOTE: Here, we are using TRUE/FALSE for the 'Enrolled' instead of 'Y/N' as it becomes easier to subset with a logical column

    data

    df1 <- structure(list(Activation.Month = c("21/4/2018", "21/3/2018"), 
    Disabled.Month = c(NA, "21/6/2018"), Month.Fee = c(10L, 20L
    ), Custr = c("A", "B")), class = "data.frame", row.names = c(NA, 
    -2L))