Search code examples
rapplylapplysapplytapply

apply Function one raw input ,many raws output


I have a table like this:

customer   ID    startdate   enddate
11         22   2015-01-01  2015-03-01
11         55   2018-04-03  2018-06-16
22         33   2017-02-01  2017-04-01

And This is the output I want:

   customer    Id  YearMonth
    11         22   201501
    11         22   201502
    11         22   201503
    11         55   201804
    11         55   201805
    11         55   201806
    22         33   201702
    22         33   201703
    22         33   201704
    22         33   201505

I've Started writing this function:

datseq<-function(t1,t2) {
seq(as.Data(t1), as.Date(t2), by="month")
}

My Questions are:

a. How can I correct the function to return me YYYYMM format?

b. How can I implemnt this function on the dataframe in order that each customer and id will get the appropriate list of months? The output should be a dataframe.

Thank you


Solution

  • We can do this with data.table, group by the sequence of rows, create a sequence from the 'startdate' to 'enddate', specifying the by as monthly and format the Date class to return the expected format ("%Y%m")

    library(data.table)
    setDT(df1)[, .(customer = customer[1], Id = ID[1], 
     YearMonth = format(seq(startdate, enddate, by = '1 month'), "%Y%m")),
          by = 1:nrow(df1)]
    

    This can also be done with tidyverse

    library(tidyverse)
    df1 %>% 
      mutate(YearMonth = map2(startdate, enddate, 
           ~ seq(.x, .y, by = "1 month") %>%
                  format(., format = "%Y%m"))) %>% 
      select(-startdate, enddate) %>% 
      unnest
    

    If we need a base R, option, then Map can be used

    lst <- Map(function(x, y) seq(x, y, by = '1 month'), df1$startdate, df1$enddate)
    

    Replicate the rows of the dataset by the lengths of the list, and create a column 'YearMonth' by concatenating the list elements and then getting the expected format

    data.frame(df1[rep(1:nrow(df1), lengths(lst)), 1:2], 
               YearMonth = format(do.call(c, lst), "%Y%m"))