Search code examples
rlubridatedifftime

How to read in timestamps of format %Y-%m-%d %H:%M:%OS3 (and do math with it)?


I have a .txt file (without any clear column delimiter) which, in each line, contains a timestamp of the format %H-%m-%d %H:%M:%OS3 (e.g. "2019-09-26 07:29:22,778") and an event character string. I would like to read-in the data and make a table which shows the intact timestamp in one column, the event in a second, and in a third one the timespan in an OS3 time format (e.g. "1.230" or "1,230" sec) between event in row 1 and event in row 2, then the one between the event in row 1 and the event in row 3 etc..

I have tried reading in the file after using "[" as a delimiter in Excel and saving in .tsv format, which is an unsatisfying work-around. However, the further use of the dplyr difftime function does not lead to a result containing milliseconds, although global options have been set to 3 digits seconds ("options(digits.secs=3)").

what the .txt looks like:

2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z
...
.
.

what I'd like to get:

timestamp                   event            timediff in sec
2019-09-26 17:54:24,406     Event X
2019-09-26 17:54:24,431     Event Y          0.025
2019-09-26 17:54:24,438     Event Z          0.032
...
.
.

Solution

  • You can use [ as deliminter and read the txt file using read.delim. The issue with the 3 digits is due to the fact that you use a comma as a delimiter and not a dot. This can be fixed using str_replace (or gsub)

    library(dplyr)
    library(stringr)
    
    my_df <- read.delim(text = "
    2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
    2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
    2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z", 
    sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))
    
    my_df
    #                 timestamp          info         event
    # 1 2019-09-26 17:54:24,406  218] INFO  -  1] - Event X
    # 2 2019-09-26 17:54:24,431  207] INFO  -  1] - Event Y
    # 3 2019-09-26 17:54:24,438  218] INFO  -  1] - Event Z
    
    my_df %>% 
      # drop the info column
      select(-info) %>% 
      mutate(# remove anything not related to the Event
             event = str_remove(event, ".*Event"), 
             # replace , with .
             timestamp = str_replace_all(timestamp, ",", "."),
             # transform to a proper timestamp
             timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"), 
             # calculate difftime (as proposed in your previous question [1])
             difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
    #                 timestamp event        difftime
    # 1 2019-09-26 17:54:24.405     X 0.00000000 secs
    # 2 2019-09-26 17:54:24.430     Y 0.02500010 secs
    # 3 2019-09-26 17:54:24.437     Z 0.03200006 secs
    

    [1] How to make a timespan column based on a time stamp column?