Search code examples
rif-statementlubridateposixct

R: if_else and timezone forcing


I have a weird problem (maybe I'm missing something?), when trying to force timezone within if_else (as ifelse does not seem to handle POSIXct well). It seems to force only when TRUE, but convert when FALSE. Why? How to fix it?

library(lubridate)
library(dplyr)
some_date = ymd_hm("2020-06-01 17:45", tz = "America/New_York")

if_else(TRUE, force_tz(some_date, tz = "GMT"), force_tz(some_date, tz = "Singapore"))
[1] "2020-06-01 17:45:00 GMT"

if_else(FALSE, force_tz(some_date, tz = "GMT"), force_tz(some_date, tz = "Singapore"))
[1] "2020-06-01 09:45:00 GMT"

I would expect the same outcome as running force_tz alone:

# if TRUE
force_tz(some_date, tz = "GMT")
[1] "2020-06-01 17:45:00 GMT"

# if FALSE
force_tz(some_date, tz = "Singapore")
[1] "2020-06-01 17:45:00 +08"

Thanks!


Solution

  • The culprit is in how dplyr::if_else is making the adjustments.

    First, my original comment about vectors and TZ still stands, and is still at the heart of this problem. For the record:

    When you're dealing with POSIXt in a vector, the TZ is an attribute of the whole vector, not each independent element. This means that either (a) you must accept that all timestamps within a vector will have the same TZ; or (b) you need to adapt your process to deal with a list of timestamps, in which case each time can have its own TZ.

    If you look at if_else:

    function (condition, true, false, missing = NULL) 
    {
        if (!is.logical(condition)) {
            bad_args("condition", "must be a logical vector, not {friendly_type_of(condition)}")
        }
        out <- true[rep(NA_integer_, length(condition))]
    

    which prepopulates the out vector with NA variants of the first ("true") vector. (This is necessary because R really has at least 6 types of NA: logical (NA), integer (NA_integer_), real/float (NA_real_), string (NA_character_), date (c.Date(NA)), and time (c.POSIXct(NA)); so how one forms a vector of NA is important.) Once the vector of NAs is prepopulated, though, realize that this is being based on the first vector, so its attributes are brought into the out vector.

    Sys.time()
    # [1] "2020-06-01 09:02:06 PDT"
    now <- Sys.time()
    attr(now, "tzone") <- "GMT"
    dput(now)
    # structure(1591027335.41804, class = c("POSIXct", "POSIXt"), tzone = "GMT")
    dput(now[NA])
    # structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "GMT")
    

    (see how tzone= is still the same). This means that the output vector (when operating on POSIXt vectors)will always carry forward the TZ of thetrueargument toif_else`.

    From here, if_else works in replacement (using its internal replace_with, which effectively just does out[condition] <- false[condition]). Replacement does not affect the TZ; in fact, the numeric equivalents of the false times are assimilated without regard to their TZ. Granted, the "absolute time in the world" for the false vector is preserved.

    The only workaround is to change your workflow to deal with a list of POSIXt instead of a vector. if_else still works there.

    now
    # [1] "2020-06-01 16:02:15 GMT"
    now1 <- list(now, now+1) ; now2 <- list(now+86400, now+86401)
    now1
    # [[1]]
    # [1] "2020-06-01 16:02:15 GMT"
    # [[2]]
    # [1] "2020-06-01 16:02:16 GMT"
    now2
    # [[1]]
    # [1] "2020-06-02 16:02:15 GMT"
    # [[2]]
    # [1] "2020-06-02 16:02:16 GMT"
    attr(now1[[2]], "tzone") <- "Singapore"
    attr(now2[[2]], "tzone") <- "US/Pacific"
    now1
    # [[1]]
    # [1] "2020-06-01 16:02:15 GMT"
    # [[2]]
    # [1] "2020-06-02 00:02:16 +08"
    now2
    # [[1]]
    # [1] "2020-06-02 16:02:15 GMT"
    # [[2]]
    # [1] "2020-06-02 09:02:16 PDT"
    if_else(c(TRUE, FALSE), now1, now2)
    # [[1]]
    # [1] "2020-06-01 16:02:15 GMT"
    # [[2]]
    # [1] "2020-06-02 09:02:16 PDT"