How to read in timestamps of format %Y-%m-%d %H:%M:%OS3 (and do math with it)?

I have a .txt file (without any clear column delimiter) which, in each line, contains a timestamp of the format %H-%m-%d %H:%M:%OS3 (e.g. "2019-09-26 07:29:22,778") and an event character string. I would like to read-in the data and make a table which shows the intact timestamp in one column, the event in a second, and in a third one the timespan in an OS3 time format (e.g. "1.230" or "1,230" sec) between event in row 1 and event in row 2, then the one between the event in row 1 and the event in row 3 etc..

I have tried reading in the file after using "[" as a delimiter in Excel and saving in .tsv format, which is an unsatisfying work-around. However, the further use of the dplyr difftime function does not lead to a result containing milliseconds, although global options have been set to 3 digits seconds ("options(digits.secs=3)").

what the .txt looks like:

2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z
...
.
.

what I'd like to get:

timestamp                   event            timediff in sec
2019-09-26 17:54:24,406     Event X
2019-09-26 17:54:24,431     Event Y          0.025
2019-09-26 17:54:24,438     Event Z          0.032
...
.
.

Solution

You can use [ as deliminter and read the txt file using read.delim. The issue with the 3 digits is due to the fact that you use a comma as a delimiter and not a dot. This can be fixed using str_replace (or gsub)

library(dplyr)
library(stringr)

my_df <- read.delim(text = "
2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z", 
sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))

my_df
#                 timestamp          info         event
# 1 2019-09-26 17:54:24,406  218] INFO  -  1] - Event X
# 2 2019-09-26 17:54:24,431  207] INFO  -  1] - Event Y
# 3 2019-09-26 17:54:24,438  218] INFO  -  1] - Event Z

my_df %>% 
  # drop the info column
  select(-info) %>% 
  mutate(# remove anything not related to the Event
         event = str_remove(event, ".*Event"), 
         # replace , with .
         timestamp = str_replace_all(timestamp, ",", "."),
         # transform to a proper timestamp
         timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"), 
         # calculate difftime (as proposed in your previous question [1])
         difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
#                 timestamp event        difftime
# 1 2019-09-26 17:54:24.405     X 0.00000000 secs
# 2 2019-09-26 17:54:24.430     Y 0.02500010 secs
# 3 2019-09-26 17:54:24.437     Z 0.03200006 secs

[1] How to make a timespan column based on a time stamp column?