Search code examples
rdata.tablelubridateradixposixct

Is there a method to round date without memory exceed in r?


I am working on a dataset where I am rounding the posixct date to the nearest hour of one column of data.table with 88 mln rows.

I used round. Date by base and round_date by lubridate, both of them exceeded memory, so could not finish. Finally, I divided the dataset into 4 equal parts, rounded the necessary column, and bound back.

Is there a good method to round date with higher memory efficiency

Thanks in advance


Solution

  • This should wowrk nice and fast

    #sample data
    library( data.table )
    n = 1000000
    set.seed(123)
    DT <- data.table( id = 1:n, 
                      timestamp = sample(seq(as.POSIXct('2017/01/01'), as.POSIXct('2020/05/01'), by="5 mins"), replace = TRUE, n) )
    
    #split timestamp to iDate and iTime
    DT[, c("date", "time") := IDateTime( timestamp ) ]
    #round the iTime
    DT[, time_rounded := round( time, units = "hour" )]
    #convert iDate and rounded iTime back to posixct (add timezone if needed)
    DT[, timestamp_rounded := as.POSIXct( time_rounded, date = date ) ]
    

    possible issue: rounding to 00:00 the next day... you should test this and adjust the date if needed...