Is there a more efficient way to row bind (or an efficient merge) two or more massive time series with data table? The time series have some different columns, so I use fill = TRUE
.
I want all the rows in each time series to appear in the final data.table. I can do it below, but the time series stamps are not ordered in dt3
below. I have to create dt4
to get the ordered stamps.
I wonder if there is a more efficient way of doing a kind of rbind/time series merge in data.table?
library(data.table)
tm <- seq(as.POSIXct("2018-05-12 00:00"), as.POSIXct("2018-05-14"), by = "hours")
dt <- data.table(time = tm, x = seq(1, length(tm), by = 1))
set.seed(1)
dt2 <- data.table(time = tm[sample(length(tm), size = 8)] + rnorm(n = 8, 0, 60),
y = rnorm(8))
# Can a one liner here get me the output in `dt4` with some kind of row bind?
# Is there a way to do a row bind here instead that avoids the creation of a new object dt4 that takes the sorted rows?
dt3 <- rbind(dt, dt2, fill = TRUE)
dt4 <- dt3[order(time)]
tail(dt4, 20)
# time x y
# 1: 2018-05-13 08:00:00 33 NA
# 2: 2018-05-13 09:00:00 34 NA
# 3: 2018-05-13 10:00:00 35 NA
# 4: 2018-05-13 11:00:00 36 NA
# 5: 2018-05-13 12:00:00 37 NA
# 6: 2018-05-13 13:00:00 38 NA
# 7: 2018-05-13 14:00:00 39 NA
# 8: 2018-05-13 14:59:41 NA 0.94383621
# 9: 2018-05-13 15:00:00 40 NA
# 10: 2018-05-13 16:00:00 41 NA
# 11: 2018-05-13 16:01:30 NA 0.82122120
# 12: 2018-05-13 17:00:00 42 NA
# 13: 2018-05-13 17:00:44 NA -0.04493361
# 14: 2018-05-13 18:00:00 43 NA
# 15: 2018-05-13 19:00:00 44 NA
# 16: 2018-05-13 20:00:00 45 NA
# 17: 2018-05-13 21:00:00 46 NA
# 18: 2018-05-13 22:00:00 47 NA
# 19: 2018-05-13 23:00:00 48 NA
# 20: 2018-05-14 00:00:00 49 NA
If you have the time columns set as keys
setkey(dt, time)
setkey(dt2, time)
Then you can use merge.data.table
merge(dt,dt2,all=TRUE)
Note, if the time series are already known to be sorted (which dt is, but dt2 is not), you can speed up a bit more by just setting the 'sorted' attribute of the data.tables, rather than calling setkey
.
setattr(dt, "sorted", "time")