Search code examples
rtimeroundingposixct

Round POSIXct by fixed intervals of time to avoid lag [R]


I have a dataset containing a POSIXct column named as date. Observations (rows) are separated from each other by intervals of 5 minutes. Unfortunately, after a certain time a lag appear between two observations delaying the rest of the data frame. In that example, 2022-05-31 09:40:00 -> 2022-05-31 09:48:27, instead of 2022-05-31 09:45:00.

I would like to correct it to have back "minutes" values of the POSIXct finishing by 5 or 0 (seconds are not necessary and can be removed).

The final purpose is to merge the dataset with another one. I would then prefer have one missing value due to a lag correction, than an entire part of my dataset despairing due to the impossibility to merge value.

An example of my dataset:

structure(list(date = structure(c(1653983700, 1653984000, 1653984300, 
1653984600, 1653984900, 1653985200, 1653985500, 1653985800, 1653986100, 
1653986400, 1653986700, 1653987000, 1653987300, 1653987600, 1653987900, 
1653988200, 1653988500, 1653988800, 1653989100, 1653989400, 1653989700, 
1653990000, 1653990507, 1653990807, 1653991107, 1653991407, 1653991707, 
1653992007, 1653992307, 1653992607, 1653992907), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Pressure[cmH2O]` = c("983.800", "983.917", "983.800", 
"984.383", "984.325", "984.033", "984.208", "984.325", "984.617", 
"984.208", "984.325", "984.208", "984.325", "984.092", "984.383", 
"984.208", "984.383", "984.500", "984.500", "984.500", "984.500", 
"971.083", "972.367", "984.967", "985.258", "984.792", "984.675", 
"984.792", "984.792", "984.967", "984.967")), row.names = c(NA, 
-31L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Use lubridate::round_date.

    lubridate::round_date(DF$date, "5 minutes")
    

    Result

     [1] "2022-05-31 07:55:00 UTC" "2022-05-31 08:00:00 UTC"
     [3] "2022-05-31 08:05:00 UTC" "2022-05-31 08:10:00 UTC"
     [5] "2022-05-31 08:15:00 UTC" "2022-05-31 08:20:00 UTC"
     [7] "2022-05-31 08:25:00 UTC" "2022-05-31 08:30:00 UTC"
     [9] "2022-05-31 08:35:00 UTC" "2022-05-31 08:40:00 UTC"
    [11] "2022-05-31 08:45:00 UTC" "2022-05-31 08:50:00 UTC"
    [13] "2022-05-31 08:55:00 UTC" "2022-05-31 09:00:00 UTC"
    [15] "2022-05-31 09:05:00 UTC" "2022-05-31 09:10:00 UTC"
    [17] "2022-05-31 09:15:00 UTC" "2022-05-31 09:20:00 UTC"
    [19] "2022-05-31 09:25:00 UTC" "2022-05-31 09:30:00 UTC"
    [21] "2022-05-31 09:35:00 UTC" "2022-05-31 09:40:00 UTC"
    [23] "2022-05-31 09:50:00 UTC" "2022-05-31 09:55:00 UTC"
    [25] "2022-05-31 10:00:00 UTC" "2022-05-31 10:05:00 UTC"
    [27] "2022-05-31 10:10:00 UTC" "2022-05-31 10:15:00 UTC"
    [29] "2022-05-31 10:20:00 UTC" "2022-05-31 10:25:00 UTC"
    [31] "2022-05-31 10:30:00 UTC"
    

    Or to round to minutes.

    round(DF$date, "mins")
    

    Result

     [1] "2022-05-31 07:55:00 UTC" "2022-05-31 08:00:00 UTC"
     [3] "2022-05-31 08:05:00 UTC" "2022-05-31 08:10:00 UTC"
     [5] "2022-05-31 08:15:00 UTC" "2022-05-31 08:20:00 UTC"
     [7] "2022-05-31 08:25:00 UTC" "2022-05-31 08:30:00 UTC"
     [9] "2022-05-31 08:35:00 UTC" "2022-05-31 08:40:00 UTC"
    [11] "2022-05-31 08:45:00 UTC" "2022-05-31 08:50:00 UTC"
    [13] "2022-05-31 08:55:00 UTC" "2022-05-31 09:00:00 UTC"
    [15] "2022-05-31 09:05:00 UTC" "2022-05-31 09:10:00 UTC"
    [17] "2022-05-31 09:15:00 UTC" "2022-05-31 09:20:00 UTC"
    [19] "2022-05-31 09:25:00 UTC" "2022-05-31 09:30:00 UTC"
    [21] "2022-05-31 09:35:00 UTC" "2022-05-31 09:40:00 UTC"
    [23] "2022-05-31 09:48:00 UTC" "2022-05-31 09:53:00 UTC"
    [25] "2022-05-31 09:58:00 UTC" "2022-05-31 10:03:00 UTC"
    [27] "2022-05-31 10:08:00 UTC" "2022-05-31 10:13:00 UTC"
    [29] "2022-05-31 10:18:00 UTC" "2022-05-31 10:23:00 UTC"
    [31] "2022-05-31 10:28:00 UTC"
    

    Data

    DF <- structure(list(date = structure(c(1653983700, 1653984000, 1653984300, 
    1653984600, 1653984900, 1653985200, 1653985500, 1653985800, 1653986100, 
    1653986400, 1653986700, 1653987000, 1653987300, 1653987600, 1653987900, 
    1653988200, 1653988500, 1653988800, 1653989100, 1653989400, 1653989700, 
    1653990000, 1653990507, 1653990807, 1653991107, 1653991407, 1653991707, 
    1653992007, 1653992307, 1653992607, 1653992907), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Pressure[cmH2O]` = c("983.800", "983.917", "983.800", 
    "984.383", "984.325", "984.033", "984.208", "984.325", "984.617", 
    "984.208", "984.325", "984.208", "984.325", "984.092", "984.383", 
    "984.208", "984.383", "984.500", "984.500", "984.500", "984.500", 
    "971.083", "972.367", "984.967", "985.258", "984.792", "984.675", 
    "984.792", "984.792", "984.967", "984.967")), row.names = c(NA, 
    -31L), class = c("tbl_df", "tbl", "data.frame"))