I have a table like this:
customer ID startdate enddate
11 22 2015-01-01 2015-03-01
11 55 2018-04-03 2018-06-16
22 33 2017-02-01 2017-04-01
And This is the output I want:
customer Id YearMonth
11 22 201501
11 22 201502
11 22 201503
11 55 201804
11 55 201805
11 55 201806
22 33 201702
22 33 201703
22 33 201704
22 33 201505
I've Started writing this function:
datseq<-function(t1,t2) {
seq(as.Data(t1), as.Date(t2), by="month")
}
My Questions are:
a. How can I correct the function to return me YYYYMM format?
b. How can I implemnt this function on the dataframe in order that each customer and id will get the appropriate list of months? The output should be a dataframe.
Thank you
We can do this with data.table
, group by the sequence of rows, create a sequence from the 'startdate' to 'enddate', specifying the by
as monthly and format
the Date
class to return the expected format ("%Y%m"
)
library(data.table)
setDT(df1)[, .(customer = customer[1], Id = ID[1],
YearMonth = format(seq(startdate, enddate, by = '1 month'), "%Y%m")),
by = 1:nrow(df1)]
This can also be done with tidyverse
library(tidyverse)
df1 %>%
mutate(YearMonth = map2(startdate, enddate,
~ seq(.x, .y, by = "1 month") %>%
format(., format = "%Y%m"))) %>%
select(-startdate, enddate) %>%
unnest
If we need a base R
, option, then Map
can be used
lst <- Map(function(x, y) seq(x, y, by = '1 month'), df1$startdate, df1$enddate)
Replicate the rows of the dataset by the lengths
of the list
, and create a column 'YearMonth' by concatenating the list
elements and then getting the expected format
data.frame(df1[rep(1:nrow(df1), lengths(lst)), 1:2],
YearMonth = format(do.call(c, lst), "%Y%m"))