I've looked through stackoverflow and have seen different variations of what I need, but none that I can make work for me.
I have a large set of data comprising 116 columns and 326438 rows.
I need each row to be split into two, using the existing date fields as the calculation, with new date columns of 'StartDate' and 'EndDate' added.
If row 1 shows a PolicyEffectiveDate of 01/06/2018 and a PolicyRenewalDate of 01/06/2019 I need the data to reflect two rows with the following:
The first row would show StartDate of 01/06/2018 and the EndDate of 31/12/2018, the next row would show StartDate of 01/01/2019 and the EndDate of 31/05/2019. StartDate and EndDate being the new columns created in this process. All other data on the new row should match the first entry, in effect we are creating two rows out of 1 with all data matching apart from the two new fields that are to be created.
What I currently have is this:
PolicyEffectiveDate PolicyRenewalDate Customer
2017-06-01 2018-06-01 Arc Ltd
2017-04-03 2018-04-03 Windonian CC
What I need is this:
PolicyStartDate PolicyEndDate Customer
2017-06-01 2017-12-31 Arc Ltd
2018-01-01 2018-05-31 Arc Ltd
2017-04-03 2017-12-31 Windonian CC
2018-01-01 2018-04-02 Windonian CC
The code to make these two example df's is:
mydf <- data.frame(PolicyEffectiveDate = as.Date(c("2017-06-01", "2017-04-03")),
PolicyRenewalDate = as.Date(c("2018-06-01", "2018-04-03")),
Customer = as.character(c("Arc Ltd", "Windonian CC")),
stringsAsFactors = FALSE)
newdf <- data.frame(PolicyStartDate = as.Date(c("2018-06-01", "2019-01-01","2017-04-03", "2018-01-01")),
PolicyEndDate = as.Date(c("2018-12-31", "2019-05-31","2017-12-31", "2018-04-02")),
Customer = as.character(c("Arc Ltd","Arc Ltd", "Windonian CC","Windonian CC")),
stringsAsFactors = FALSE)
You can use ceiling_date
from lubridate
:
library(lubridate)
library(dplyr)
df %>%
mutate(PolicyRenewalDate = ceiling_date(PolicyEffectiveDate, "y") - 1) %>%
bind_rows(mutate(df,
PolicyEffectiveDate = .$PolicyRenewalDate + 1,
PolicyRenewalDate = PolicyRenewalDate - 1
)) %>%
arrange(Customer) %>%
rename(PolicyStartDate = PolicyEffectiveDate,
PolicyEndDate = PolicyRenewalDate)
#### OUTPUT ####
PolicyStartDate PolicyEndDate Customer
1 2017-06-01 2017-12-31 Arc Ltd
2 2018-01-01 2018-05-31 Arc Ltd
3 2017-04-03 2017-12-31 Windonian CC
4 2018-01-01 2018-04-02 Windonian CC