I would like to "expand" a dataframe, duplicating the information on some columns the number of times indicated by a fifth column.
What would the most efficiency to achieve this task with R? (Open to Data Table or Dplyer, reshape solutions).
f_1 f_2 d_1 d_2 i_1
1: 1 A 2016-01-01 <NA> NA
2: 2 A 2016-01-02 <NA> NA
3: 2 B 2016-01-03 2016-01-01 2
4: 3 C 2016-01-04 <NA> NA
5: 4 D 2016-01-05 2016-01-02 5
f_1 f_2 d_1 d_2 i_1
1: 1 A 2016-01-01 <NA> NA
2: 2 A 2016-01-02 <NA> NA
3: 2 B 2016-01-03 2016-01-01 1
4: 2 B 2016-01-03 2016-01-01 2
5: 3 C 2016-01-04 <NA> NA
6: 4 D 2016-01-05 2016-01-02 1
7: 4 D 2016-01-05 2016-01-02 2
8: 4 D 2016-01-05 2016-01-02 3
9: 4 D 2016-01-05 2016-01-02 4
10: 4 D 2016-01-05 2016-01-02 5
DT <- data.table(
f_1 = factor(c(1,2,2,3,4)),
f_2 = factor(c("A", "A", "B", "C", "D")),
d_1 = as.Date(c("2016-01-01","2016-01-02","2016-01-03","2016-01-04","2016-01-05")),
d_2 = as.Date(c(NA,NA,"2016-01-01",NA,"2016-01-02")),
i_1 = as.integer(c(NA,NA,2,NA,5)))
Thanks and sorry if it is duplicated. I am struggling with this kind of reshaping exercises.
Here is a data.table
solution. Basically, group by those columns that you want to duplicate and generate sequence of integers using the number in i_1
DT[, .(i_1=if(!is.na(i_1)) seq_len(i_1) else i_1),
by=c(names(DT)[-ncol(DT)])]
output:
f_1 f_2 d_1 d_2 i_1
1: 1 A 2016-01-01 <NA> NA
2: 2 A 2016-01-02 <NA> NA
3: 2 B 2016-01-03 2016-01-01 1
4: 2 B 2016-01-03 2016-01-01 2
5: 3 C 2016-01-04 <NA> NA
6: 4 D 2016-01-05 2016-01-02 1
7: 4 D 2016-01-05 2016-01-02 2
8: 4 D 2016-01-05 2016-01-02 3
9: 4 D 2016-01-05 2016-01-02 4
10: 4 D 2016-01-05 2016-01-02 5
Or another way using data.table
. For each row, create a sequence of numbers using i_1
and add the original data to that sequence with c(.SD[, -"i_1], .....
and finally remove the by
column
DT[, c(.SD[, -"i_1"], .(i_1=if (!is.na(i_1)) seq_len(i_1) else i_1)),
by=seq_len(DT[,.N])][,-1L]