I have a .txt file (without any clear column delimiter) which, in each line, contains a timestamp of the format %H-%m-%d %H:%M:%OS3 (e.g. "2019-09-26 07:29:22,778") and an event character string. I would like to read-in the data and make a table which shows the intact timestamp in one column, the event in a second, and in a third one the timespan in an OS3 time format (e.g. "1.230" or "1,230" sec) between event in row 1 and event in row 2, then the one between the event in row 1 and the event in row 3 etc..
I have tried reading in the file after using "[" as a delimiter in Excel and saving in .tsv format, which is an unsatisfying work-around. However, the further use of the dplyr difftime function does not lead to a result containing milliseconds, although global options have been set to 3 digits seconds ("options(digits.secs=3)").
what the .txt looks like:
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z
...
.
.
what I'd like to get:
timestamp event timediff in sec
2019-09-26 17:54:24,406 Event X
2019-09-26 17:54:24,431 Event Y 0.025
2019-09-26 17:54:24,438 Event Z 0.032
...
.
.
You can use [ as deliminter and read the txt file using read.delim
. The issue with the 3 digits is due to the fact that you use a comma as a delimiter and not a dot. This can be fixed using str_replace
(or gsub
)
library(dplyr)
library(stringr)
my_df <- read.delim(text = "
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z",
sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))
my_df
# timestamp info event
# 1 2019-09-26 17:54:24,406 218] INFO - 1] - Event X
# 2 2019-09-26 17:54:24,431 207] INFO - 1] - Event Y
# 3 2019-09-26 17:54:24,438 218] INFO - 1] - Event Z
my_df %>%
# drop the info column
select(-info) %>%
mutate(# remove anything not related to the Event
event = str_remove(event, ".*Event"),
# replace , with .
timestamp = str_replace_all(timestamp, ",", "."),
# transform to a proper timestamp
timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"),
# calculate difftime (as proposed in your previous question [1])
difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
# timestamp event difftime
# 1 2019-09-26 17:54:24.405 X 0.00000000 secs
# 2 2019-09-26 17:54:24.430 Y 0.02500010 secs
# 3 2019-09-26 17:54:24.437 Z 0.03200006 secs
[1] How to make a timespan column based on a time stamp column?