I have a dataset containing a POSIXct column named as date
. Observations (rows) are separated from each other by intervals of 5 minutes. Unfortunately, after a certain time a lag appear between two observations delaying the rest of the data frame.
In that example, 2022-05-31 09:40:00
-> 2022-05-31 09:48:27
, instead of 2022-05-31 09:45:00
.
I would like to correct it to have back "minutes" values of the POSIXct finishing by 5 or 0 (seconds are not necessary and can be removed).
The final purpose is to merge the dataset with another one. I would then prefer have one missing value due to a lag correction, than an entire part of my dataset despairing due to the impossibility to merge value.
An example of my dataset:
structure(list(date = structure(c(1653983700, 1653984000, 1653984300,
1653984600, 1653984900, 1653985200, 1653985500, 1653985800, 1653986100,
1653986400, 1653986700, 1653987000, 1653987300, 1653987600, 1653987900,
1653988200, 1653988500, 1653988800, 1653989100, 1653989400, 1653989700,
1653990000, 1653990507, 1653990807, 1653991107, 1653991407, 1653991707,
1653992007, 1653992307, 1653992607, 1653992907), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Pressure[cmH2O]` = c("983.800", "983.917", "983.800",
"984.383", "984.325", "984.033", "984.208", "984.325", "984.617",
"984.208", "984.325", "984.208", "984.325", "984.092", "984.383",
"984.208", "984.383", "984.500", "984.500", "984.500", "984.500",
"971.083", "972.367", "984.967", "985.258", "984.792", "984.675",
"984.792", "984.792", "984.967", "984.967")), row.names = c(NA,
-31L), class = c("tbl_df", "tbl", "data.frame"))
Use lubridate::round_date.
lubridate::round_date(DF$date, "5 minutes")
Result
[1] "2022-05-31 07:55:00 UTC" "2022-05-31 08:00:00 UTC"
[3] "2022-05-31 08:05:00 UTC" "2022-05-31 08:10:00 UTC"
[5] "2022-05-31 08:15:00 UTC" "2022-05-31 08:20:00 UTC"
[7] "2022-05-31 08:25:00 UTC" "2022-05-31 08:30:00 UTC"
[9] "2022-05-31 08:35:00 UTC" "2022-05-31 08:40:00 UTC"
[11] "2022-05-31 08:45:00 UTC" "2022-05-31 08:50:00 UTC"
[13] "2022-05-31 08:55:00 UTC" "2022-05-31 09:00:00 UTC"
[15] "2022-05-31 09:05:00 UTC" "2022-05-31 09:10:00 UTC"
[17] "2022-05-31 09:15:00 UTC" "2022-05-31 09:20:00 UTC"
[19] "2022-05-31 09:25:00 UTC" "2022-05-31 09:30:00 UTC"
[21] "2022-05-31 09:35:00 UTC" "2022-05-31 09:40:00 UTC"
[23] "2022-05-31 09:50:00 UTC" "2022-05-31 09:55:00 UTC"
[25] "2022-05-31 10:00:00 UTC" "2022-05-31 10:05:00 UTC"
[27] "2022-05-31 10:10:00 UTC" "2022-05-31 10:15:00 UTC"
[29] "2022-05-31 10:20:00 UTC" "2022-05-31 10:25:00 UTC"
[31] "2022-05-31 10:30:00 UTC"
Or to round to minutes.
round(DF$date, "mins")
Result
[1] "2022-05-31 07:55:00 UTC" "2022-05-31 08:00:00 UTC"
[3] "2022-05-31 08:05:00 UTC" "2022-05-31 08:10:00 UTC"
[5] "2022-05-31 08:15:00 UTC" "2022-05-31 08:20:00 UTC"
[7] "2022-05-31 08:25:00 UTC" "2022-05-31 08:30:00 UTC"
[9] "2022-05-31 08:35:00 UTC" "2022-05-31 08:40:00 UTC"
[11] "2022-05-31 08:45:00 UTC" "2022-05-31 08:50:00 UTC"
[13] "2022-05-31 08:55:00 UTC" "2022-05-31 09:00:00 UTC"
[15] "2022-05-31 09:05:00 UTC" "2022-05-31 09:10:00 UTC"
[17] "2022-05-31 09:15:00 UTC" "2022-05-31 09:20:00 UTC"
[19] "2022-05-31 09:25:00 UTC" "2022-05-31 09:30:00 UTC"
[21] "2022-05-31 09:35:00 UTC" "2022-05-31 09:40:00 UTC"
[23] "2022-05-31 09:48:00 UTC" "2022-05-31 09:53:00 UTC"
[25] "2022-05-31 09:58:00 UTC" "2022-05-31 10:03:00 UTC"
[27] "2022-05-31 10:08:00 UTC" "2022-05-31 10:13:00 UTC"
[29] "2022-05-31 10:18:00 UTC" "2022-05-31 10:23:00 UTC"
[31] "2022-05-31 10:28:00 UTC"
Data
DF <- structure(list(date = structure(c(1653983700, 1653984000, 1653984300,
1653984600, 1653984900, 1653985200, 1653985500, 1653985800, 1653986100,
1653986400, 1653986700, 1653987000, 1653987300, 1653987600, 1653987900,
1653988200, 1653988500, 1653988800, 1653989100, 1653989400, 1653989700,
1653990000, 1653990507, 1653990807, 1653991107, 1653991407, 1653991707,
1653992007, 1653992307, 1653992607, 1653992907), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Pressure[cmH2O]` = c("983.800", "983.917", "983.800",
"984.383", "984.325", "984.033", "984.208", "984.325", "984.617",
"984.208", "984.325", "984.208", "984.325", "984.092", "984.383",
"984.208", "984.383", "984.500", "984.500", "984.500", "984.500",
"971.083", "972.367", "984.967", "985.258", "984.792", "984.675",
"984.792", "984.792", "984.967", "984.967")), row.names = c(NA,
-31L), class = c("tbl_df", "tbl", "data.frame"))