Search code examples
rdatetimeparsingreadr

Parsing txt file in R


I need to parse a txt file like this:

2021 Sep 27 15:54:50     avg_dur     =      0.321 s
2021 Sep 27 15:54:52     avg_dur     =      0.036 s
2021 Sep 27 15:54:54     avg_dur     =      0.350 s
2021 Sep 27 15:54:56     avg_dur     =      0.317 s

I am interest in parsing the date and the number in a R data frame. I am trying a parser like this (only for the date):

df <- read_table("myFile.txt", col_names = FALSE, col_types = cols(X1 = col_datetime(format = "%Y %b %d %H:%M:%S")))

But it doesn't work:

Warning: 31502 parsing failures.
row col                    expected actual                                                file
  1  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  2  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  3  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  4  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
  5  X1 date like %Y %b %d %H:%M:%S   2021 'uclStats/91.211.159.43-dash_d1_gwv_vos-u5.log-avg'
... ... ........................... ...... ...................................................
See problems(...) for more details.

The problem is clearly that it's trying to parse the first column with the recipe of the whole date time.

Which is the correct way to parse this txt file in a data frame?

Regards, S.


Solution

  • 1) read.zoo Read it into a zoo object, z, and then convert that to a data frame (or just leave it as a zoo object). This makes use of the fact that junk at the end of the index column will be ignored when converting to POSIXct.

    We have used Lines in the Note at the end for reproducibility but text = Lines can be replaced with "myFile.txt".

    library(zoo)
    
    z <- read.zoo(text = Lines, sep = "=", 
      format = "%Y %b %d %H:%M:%S", tz = "", comment.char = "s")
    fortify.zoo(z)
    

    giving this data frame having POSIXct and numeric columns:

                    Index     z
    1 2021-09-27 15:54:50 0.321
    2 2021-09-27 15:54:52 0.036
    3 2021-09-27 15:54:54 0.350
    4 2021-09-27 15:54:56 0.317
    

    2) Base R Read it into a data frame dd and then convert the first column to POSIXct.

    dd <- read.table(text = Lines, sep = "=", comment.char = "s")
    dd$V1 <- as.POSIXct(dd$V1, format = "%Y %b %d %H:%M:%S")
    

    Note

    Lines <- "2021 Sep 27 15:54:50     avg_dur     =      0.321 s
    2021 Sep 27 15:54:52     avg_dur     =      0.036 s
    2021 Sep 27 15:54:54     avg_dur     =      0.350 s
    2021 Sep 27 15:54:56     avg_dur     =      0.317 s"