Search code examples
rtimezonestrptimestring-to-datetime

Using strptime %z with special timezone format


I am working with .csv data that was exported from Teradata. Several columns were originally timestamps with timezones, so after loading the .csv in R I'd like to convert these columns (which are loaded as strings) to POSIXlt or POSIXct. I am using strptime, but the format of the timezone from the .csv file does not match what strptime is expecting. For example, it expects -0400 but the .csv has the format -04:00 where a colon separates the hours and minutes.

I can remove the colon, but this is an extra step and complication I'd like to avoid if possible. Is there a way to tell strptime to use a different format for the timezone (%z)?

Here is an example:

## Example data:
x <- c("2011-10-12 22:17:13.860746-04:00", "2011-10-12 22:17:13.860746+00:00")
format <- "%Y-%m-%d %H:%M:%OS%z"

## Doesn't work:
strptime(x,format)
## [1] NA NA

## Ignores the timezone:
as.POSIXct(x)
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 22:17:13 EDT"

## Remove the last colon:
x2 <- gsub("(.*):", "\\1", x)
x2
## [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"

## This works, but requires extra processing (removing the colon)
strptime(x2,format)
## [1] "2011-10-12 22:17:13" "2011-10-12 18:17:13"

So I'm looking to achieve this last result using something like strptime(x,"%Y-%m-%d %H:%M:%OS%zz"), where %zz is a custom expression for the timezone that recognizes the -04:00 format. Or %zH:%zM might be even better.

If this isn't possible, does anyone have a slick/flexible function for converting strings (of various formats) to dates for multiple columns of a data.frame/data.table?


Solution

  • It turns out lubridate can handle this format:

    library(lubridate)
    ymd_hms(x)
    ## [1] "2011-10-13 02:17:13 UTC" "2011-10-12 22:17:13 UTC"
    

    Or, to display in the local timezone:

    with_tz(ymd_hms(x))
    ## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 18:17:13 EDT"
    

    For more flexibility (still using lubridate):

    parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z")
    

    For faster speed (amongst lubridate options):

    lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")
    

    Timings:

    microbenchmark(
      ymd_hms(x),
      parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z"),
      lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO"),
      strptime(gsub("(.*):", "\\1", x), format)
    )
    
    ## Unit: microseconds
    ##                                               expr      min       lq       mean    median        uq      max neval
    ##                                         ymd_hms(x) 1523.819 1578.495 1715.14577 1629.5385 1744.3695 2850.393   100
    ##         parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z") 1108.676 1150.633 1273.77301 1190.3315 1264.8050 5947.204   100
    ##  lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")   89.838  103.390  112.45338  107.8425  115.2265  216.512   100
    ##        strptime(gsub("(.*):", "\\\\1", x), format)   46.716   58.294   71.90934   69.9415   86.5860  105.044   100