Search code examples
rdatesplitrowsgenerate

Split row into two based and add new column based on dates in R


I've looked through stackoverflow and have seen different variations of what I need, but none that I can make work for me.

I have a large set of data comprising 116 columns and 326438 rows.

I need each row to be split into two, using the existing date fields as the calculation, with new date columns of 'StartDate' and 'EndDate' added.

If row 1 shows a PolicyEffectiveDate of 01/06/2018 and a PolicyRenewalDate of 01/06/2019 I need the data to reflect two rows with the following:

The first row would show StartDate of 01/06/2018 and the EndDate of 31/12/2018, the next row would show StartDate of 01/01/2019 and the EndDate of 31/05/2019. StartDate and EndDate being the new columns created in this process. All other data on the new row should match the first entry, in effect we are creating two rows out of 1 with all data matching apart from the two new fields that are to be created.

What I currently have is this:

PolicyEffectiveDate  PolicyRenewalDate  Customer
2017-06-01           2018-06-01         Arc Ltd
2017-04-03           2018-04-03         Windonian CC

What I need is this:

PolicyStartDate  PolicyEndDate  Customer
2017-06-01       2017-12-31     Arc Ltd
2018-01-01       2018-05-31     Arc Ltd
2017-04-03       2017-12-31     Windonian CC
2018-01-01       2018-04-02     Windonian CC

The code to make these two example df's is:

mydf <- data.frame(PolicyEffectiveDate = as.Date(c("2017-06-01", "2017-04-03")),
               PolicyRenewalDate = as.Date(c("2018-06-01", "2018-04-03")),
               Customer = as.character(c("Arc Ltd", "Windonian CC")),
               stringsAsFactors = FALSE)

newdf <- data.frame(PolicyStartDate = as.Date(c("2018-06-01", "2019-01-01","2017-04-03", "2018-01-01")),
               PolicyEndDate = as.Date(c("2018-12-31", "2019-05-31","2017-12-31", "2018-04-02")),
               Customer = as.character(c("Arc Ltd","Arc Ltd", "Windonian CC","Windonian CC")),
               stringsAsFactors = FALSE)

Solution

  • You can use ceiling_date from lubridate:

    library(lubridate)
    library(dplyr)
    
    df %>%
        mutate(PolicyRenewalDate = ceiling_date(PolicyEffectiveDate, "y") - 1) %>%
        bind_rows(mutate(df, 
                         PolicyEffectiveDate = .$PolicyRenewalDate + 1,
                         PolicyRenewalDate = PolicyRenewalDate - 1
                         )) %>%
        arrange(Customer) %>%
        rename(PolicyStartDate = PolicyEffectiveDate, 
               PolicyEndDate = PolicyRenewalDate)
    
    #### OUTPUT ####
    
      PolicyStartDate PolicyEndDate     Customer
    1      2017-06-01    2017-12-31      Arc Ltd
    2      2018-01-01    2018-05-31      Arc Ltd
    3      2017-04-03    2017-12-31 Windonian CC
    4      2018-01-01    2018-04-02 Windonian CC