R : Repeat rows in a data frame with new set of values for certain columns

employee <- c("John", "Adi", "Sam")
salary <- c(21000, 22000, 23000)
startdate <- as.Date(c("2014-11-01","2014-01-01","2014-10-01"))
enddate <- as.Date(c("2015-10-31","2014-12-31","2015-10-31"))
N<- c(2,1,2)
df<- data.frame(employee,salary, startdate, enddate, N)

I want to repeat the entire row by "n" number of times where "n" is specified in the column N but I want to change the enddate in the original row to be fixed date such as "31/12/2014" and make this fixed date as startdate in the repeating row. Run the code to see the resultant example (expected output)in df2:

employee <- c(rep("John",2), "Adi", rep("Sam",2))
salary <- c(21000,21000, 22000, 23000,23000)
startdate <- as.Date(c("2014-11-01","2014-12-31", "2014-01-01","2014-10-01","2014-12-31"))
enddate <- as.Date(c("2014-12-31","2015-10-31","2014-12-31","2014-12-31","2015-10-31"))
N<- c(2,2,1,2,2)
df2<- data.frame(employee,salary, startdate, enddate, N)

Solution

We could do this using data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), expand the rows by replicating the 'N' variable. We get the numeric index ('i1') of the observation (.I[1L]) grouped by 'employee', use that to assign (:=) the 'enddate' with '2014-12-31'. Similarly, we get the row index ('i2') of 2nd to last element (.I[seq_len(.N)>1L]) for each 'employee' and set the 'startdate' as '2014-12-31'.

DT <- setDT(df)[rep(seq_len(.N), N)]
i1 <- DT[,  .I[1L] , by = employee]$V1
DT[i1, enddate:= as.Date('2014-12-31')]
i2 <- DT[, .I[seq_len(.N)>1L], employee]$V1
DT[i2, startdate:= as.Date('2014-12-31')]
identical(as.data.table(df2), DT)
#[1] TRUE

Or we can do this using if and concatenate the '2014-12-31' for the 'startdate' and 'enddate', grouped by 'employee' and then assign the output back to the columns 'startdate', 'enddate'.

DT[, c('startdate', 'enddate') := if(.N>1L) 
            list(c(startdate[1L], as.Date('2014-12-31')),
                 c(as.Date('2014-12-31'), enddate[.N])) , by = employee]
identical(DT, as.data.table(df2))
#[1] TRUE