I am working on a dataset where I am rounding the posixct date to the nearest hour of one column of data.table with 88 mln rows.
I used round. Date by base and round_date by lubridate, both of them exceeded memory, so could not finish. Finally, I divided the dataset into 4 equal parts, rounded the necessary column, and bound back.
Is there a good method to round date with higher memory efficiency
Thanks in advance
This should wowrk nice and fast
#sample data
library( data.table )
n = 1000000
set.seed(123)
DT <- data.table( id = 1:n,
timestamp = sample(seq(as.POSIXct('2017/01/01'), as.POSIXct('2020/05/01'), by="5 mins"), replace = TRUE, n) )
#split timestamp to iDate and iTime
DT[, c("date", "time") := IDateTime( timestamp ) ]
#round the iTime
DT[, time_rounded := round( time, units = "hour" )]
#convert iDate and rounded iTime back to posixct (add timezone if needed)
DT[, timestamp_rounded := as.POSIXct( time_rounded, date = date ) ]
possible issue: rounding to 00:00 the next day... you should test this and adjust the date if needed...