Search code examples
rdplyrlubridateposixctstrptime

Parsing and working with microsecond-precision timestamps in R using dplyr


I'm planning to collect data with microsecond-precision timestamps in the format "DD/MM/YY HH:MM:SS.mmmuuun"? How would I parse in R and keep working in a dplyr pipeline?

Example:

"26/09/24 11a:16:36.5933210"
df <- data.frame(
  ID = c(1, 1, 1),
  Timestamp = c("26/09/24 11a:16:36.5933210",
                "26/09/24 11a:16:36.6518648",
                "26/09/24 11p:16:36.7280308"),
  CH = c(11, 11, 11)
)

I would still like to retain the date and time information for later correlation with observations. Note "a" for am and "p" for pm.


Solution

  • Using dmy_hms with AM/PM moved to the end of the string.

    library(dplyr)
    library(lubridate)
    
    df %>% 
      mutate(Timestamp_new = dmy_hms(sub("( \\d{2})([ap])(.*$)", "\\1\\3 \\2m", Timestamp)),
             ms = format(Timestamp_new, "%OS3"))
      ID                  Timestamp CH       Timestamp_new     ms
    1  1 26/09/24 11a:16:36.5933210 11 2024-09-26 11:16:36 36.593
    2  1 26/09/24 11a:16:36.6518648 11 2024-09-26 11:16:36 36.651
    3  1 26/09/24 11a:16:36.7280308 11 2024-09-26 11:16:36 36.728
    

    Note you can use up to n <= 6 in %OSn. Also using as.POSIXct with format="%d/%m/%y %I:%M:%OS %p") will work too. Thanks @thelatemail for pointing out the right format.