Search code examples
rdatetimetimezonetimezone-offset

How to read in a character datetime with a timezone offset in R?


I have a dataset containing a datetime character column, in CEST/CET time zone (Central Europe local time). The time zone offset is denoted with +01:00/+02:00 at the end. I want to convert this into POSIXct format so it can later be converted to UTC, but on the October long clock change day the additional hour from 2 am to 3 am is read in incorrectly, as the time zone offset seems to be ignored:

Screenshot of source dataframe

My goal is to make reprex$datetime_CEST_CET_converted[4] return "2023-10-29 02:00:00 CET" instead of "2023-10-29 02:00:00 CEST":

Screenshot of console

Reprex:

library(dplyr)
library(lubridate)

source <- data.frame(
  datetime_CEST_CET_character = c("2023-10-29 00:00+02:00", "2023-10-29 01:00+02:00", "2023-10-29 02:00+02:00",
                                  "2023-10-29 02:00+01:00", "2023-10-29 03:00+01:00", "2023-10-29 04:00+01:00")
)

reprex <- source %>%
  mutate(datetime_CEST_CET_converted = as.POSIXct(datetime_CEST_CET_character, tz = "Europe/Paris"),
         datetime_UTC = with_tz(datetime_CEST_CET_converted, tzone = "UTC"))

reprex$datetime_CEST_CET_converted[3]

reprex$datetime_CEST_CET_converted[4]

reprex$datetime_CEST_CET_converted[5] - hours(1)

I have tried adding format="%Y-%m-%d %H:%M+%z" in as.POSIXct() after removing colons in the time zone offset but I get NAs as a result:

Screenshot of source dataframe without colons

Screenshot of console without colons

source_without_colon_in_timezone <- data.frame(
  datetime_CEST_CET_character = c("2023-10-29 00:00+0200", "2023-10-29 01:00+0200", "2023-10-29 02:00+0200",
                                  "2023-10-29 02:00+0100", "2023-10-29 03:00+0100", "2023-10-29 04:00+0100")
)

reprex_without_colon_in_timezone <- source_without_colon_in_timezone %>%
  mutate(datetime_CEST_CET_converted = as.POSIXct(datetime_CEST_CET_character, format="%Y-%m-%d %H:%M+%z", tz = "Europe/Paris"),
         datetime_UTC = with_tz(datetime_CEST_CET_converted, tzone = "UTC"))

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[3]

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[4]

reprex_without_colon_in_timezone$datetime_CEST_CET_converted[5] - hours(1)

Solution

  • Just use clock and call it a day! It has support for the +HH:MM offset-from-utc convention seen here.

    Your date-time is almost in the standard RFC 3339 format, in which case you could just use date_time_parse_RFC_3339(), but the lack of seconds make it slightly more complex. Instead we parse with the more flexible sys_time_parse() and convert the results into POSIXct with as_date_time().

    library(clock)
    
    x <- c(
      "2023-10-29 00:00+02:00",
      "2023-10-29 01:00+02:00",
      "2023-10-29 02:00+02:00",
      "2023-10-29 02:00+01:00",
      "2023-10-29 03:00+01:00",
      "2023-10-29 04:00+01:00"
    )
    
    # Parse into (roughly) UTC, respecting `%Ez`, i.e. the `+HH:MM` bit
    x <- sys_time_parse(
      x,
      format = "%Y-%m-%d %H:%M%Ez",
      precision = "minute"
    )
    x
    #> <sys_time<minute>[6]>
    #> [1] "2023-10-28T22:00" "2023-10-28T23:00" "2023-10-29T00:00" "2023-10-29T01:00"
    #> [5] "2023-10-29T02:00" "2023-10-29T03:00"
    
    # Convert to POSIXct with your expected time zone
    as_date_time(x, zone = "Europe/Paris")
    #> [1] "2023-10-29 00:00:00 CEST" "2023-10-29 01:00:00 CEST"
    #> [3] "2023-10-29 02:00:00 CEST" "2023-10-29 02:00:00 CET" 
    #> [5] "2023-10-29 03:00:00 CET"  "2023-10-29 04:00:00 CET"
    
    # Or UTC if you wanted that
    as_date_time(x, zone = "UTC")
    #> [1] "2023-10-28 22:00:00 UTC" "2023-10-28 23:00:00 UTC"
    #> [3] "2023-10-29 00:00:00 UTC" "2023-10-29 01:00:00 UTC"
    #> [5] "2023-10-29 02:00:00 UTC" "2023-10-29 03:00:00 UTC"