employee <- c("John", "Adi", "Sam")
salary <- c(21000, 22000, 23000)
startdate <- as.Date(c("2014-11-01","2014-01-01","2014-10-01"))
enddate <- as.Date(c("2015-10-31","2014-12-31","2015-10-31"))
N<- c(2,1,2)
df<- data.frame(employee,salary, startdate, enddate, N)
I want to repeat the entire row by "n" number of times where "n" is specified in the column N but I want to change the enddate in the original row to be fixed date such as "31/12/2014" and make this fixed date as startdate in the repeating row. Run the code to see the resultant example (expected output)in df2:
employee <- c(rep("John",2), "Adi", rep("Sam",2))
salary <- c(21000,21000, 22000, 23000,23000)
startdate <- as.Date(c("2014-11-01","2014-12-31", "2014-01-01","2014-10-01","2014-12-31"))
enddate <- as.Date(c("2014-12-31","2015-10-31","2014-12-31","2014-12-31","2015-10-31"))
N<- c(2,2,1,2,2)
df2<- data.frame(employee,salary, startdate, enddate, N)
We could do this using data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), expand the rows by replicating the 'N' variable. We get the numeric index ('i1') of the observation (.I[1L]
) grouped by 'employee', use that to assign (:=
) the 'enddate' with '2014-12-31'. Similarly, we get the row index ('i2') of 2nd to last element (.I[seq_len(.N)>1L]
) for each 'employee' and set the 'startdate' as '2014-12-31'.
DT <- setDT(df)[rep(seq_len(.N), N)]
i1 <- DT[, .I[1L] , by = employee]$V1
DT[i1, enddate:= as.Date('2014-12-31')]
i2 <- DT[, .I[seq_len(.N)>1L], employee]$V1
DT[i2, startdate:= as.Date('2014-12-31')]
identical(as.data.table(df2), DT)
#[1] TRUE
Or we can do this using if
and concatenate the '2014-12-31' for the 'startdate' and 'enddate', grouped by 'employee' and then assign the output back to the columns 'startdate', 'enddate'.
DT[, c('startdate', 'enddate') := if(.N>1L)
list(c(startdate[1L], as.Date('2014-12-31')),
c(as.Date('2014-12-31'), enddate[.N])) , by = employee]
identical(DT, as.data.table(df2))
#[1] TRUE