I have a data table like (data is not necessarily ordered by 'col1')
col0 col1 col2
1: abc 1 a
2: abc 2 b
3: abc 3 c
4: abc 4 d
5: abc 5 e
6: def 1 a
7: def 2 b
8: def 3 c
9: def 4 d
10: def 5 e
I want to reshape it the following way
col0 col1 col2 new_1 new_2 new_3 new_4
1: abc 1 a NA NA NA NA
2: abc 2 b a NA NA NA
3: abc 3 c b a NA NA
4: abc 4 d c b a NA
5: abc 5 e d c b a
6: def 1 a NA NA NA NA
7: def 2 b a NA NA NA
8: def 3 c b a NA NA
9: def 4 d c b a NA
10: def 5 e d c b a
Basically I want to get previously occurred values of col2 for each row in the same row as above and if there is none the corresponding new column should say NA.
I can of course do it by merge on col2 5 times but I need to do this on a large table (in that case I will have to merge 20-30 times).
What is the best way to achieve it in R in 1 or 2 lines?
We can use shift
from the devel version of data.table i.e. v1.9.5
(Instructions to install the devel version are here
. By default, the type
in shift
is lag
. We can specify n
as a vector, in this case 1:4
. We assign (:=
) the output to new columns.
library(data.table)#v1.9.5+
DT[, paste('new', 1:4, sep="_") := shift(col2, 1:4)]
DT
# col1 col2 new_1 new_2 new_3 new_4
#1: 1 a NA NA NA NA
#2: 2 b a NA NA NA
#3: 3 c b a NA NA
#4: 4 d c b a NA
#5: 5 e d c b a
For the new dataset 'DT2', we need to group by 'col0' and then do the shift
on 'col2'
DT2[, paste('new', 1:4, sep="_") := shift(col2, 1:4), by = col0]
DT2
# col0 col1 col2 new_1 new_2 new_3 new_4
# 1: abc 1 a NA NA NA NA
# 2: abc 2 b a NA NA NA
# 3: abc 3 c b a NA NA
# 4: abc 4 d c b a NA
# 5: abc 5 e d c b a
# 6: def 1 a NA NA NA NA
# 7: def 2 b a NA NA NA
# 8: def 3 c b a NA NA
# 9: def 4 d c b a NA
#10: def 5 e d c b a
df1 <- structure(list(col1 = 1:5, col2 = c("a", "b", "c", "d", "e"),
new_1 = c(NA, "a", "b", "c", "d"), new_2 = c(NA, NA, "a",
"b", "c"), new_3 = c(NA, NA, NA, "a", "b"), new_4 = c(NA,
NA, NA, NA, "a")), .Names = c("col1", "col2", "new_1", "new_2",
"new_3", "new_4"), class = "data.frame", row.names = c(NA, -5L
))
DT <- as.data.table(df1)
df2 <- structure(list(col0 = c("abc", "abc", "abc", "abc", "abc",
"def",
"def", "def", "def", "def"), col1 = c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L), col2 = c("a", "b", "c", "d", "e", "a", "b",
"c", "d", "e")), .Names = c("col0", "col1", "col2"),
class = "data.frame", row.names = c(NA, -10L))
DT2 <- as.data.table(df2)