I need to parse a txt file like this:
2021 Sep 27 15:54:50 avg_dur = 0.321 s
2021 Sep 27 15:54:52 avg_dur = 0.036 s
2021 Sep 27 15:54:54 avg_dur = 0.350 s
2021 Sep 27 15:54:56 avg_dur = 0.317 s
I am interest in parsing the date and the number in a R data frame. I am trying a parser like this (only for the date):
df <- read_table("myFile.txt", col_names = FALSE, col_types = cols(X1 = col_datetime(format = "%Y %b %d %H:%M:%S")))
But it doesn't work:
Warning: 31502 parsing failures.
row col expected actual file
1 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
2 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
3 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
4 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
5 X1 date like %Y %b %d %H:%M:%S 2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
... ... ........................... ...... ...................................................
See problems(...) for more details.
The problem is clearly that it's trying to parse the first column with the recipe of the whole date time.
Which is the correct way to parse this txt file in a data frame?
Regards, S.
1) read.zoo Read it into a zoo object, z
, and then convert that to a data frame (or just leave it as a zoo object). This makes use of the fact that junk at the end of the index column will be ignored when converting to POSIXct.
We have used Lines
in the Note at the end for reproducibility but text = Lines
can be replaced with "myFile.txt"
.
library(zoo)
z <- read.zoo(text = Lines, sep = "=",
format = "%Y %b %d %H:%M:%S", tz = "", comment.char = "s")
fortify.zoo(z)
giving this data frame having POSIXct and numeric columns:
Index z
1 2021-09-27 15:54:50 0.321
2 2021-09-27 15:54:52 0.036
3 2021-09-27 15:54:54 0.350
4 2021-09-27 15:54:56 0.317
2) Base R Read it into a data frame dd
and then convert the first column to POSIXct.
dd <- read.table(text = Lines, sep = "=", comment.char = "s")
dd$V1 <- as.POSIXct(dd$V1, format = "%Y %b %d %H:%M:%S")
Lines <- "2021 Sep 27 15:54:50 avg_dur = 0.321 s
2021 Sep 27 15:54:52 avg_dur = 0.036 s
2021 Sep 27 15:54:54 avg_dur = 0.350 s
2021 Sep 27 15:54:56 avg_dur = 0.317 s"