I have a weird problem (maybe I'm missing something?), when trying to force timezone within if_else
(as ifelse
does not seem to handle POSIXct
well). It seems to force only when TRUE
, but convert when FALSE
. Why? How to fix it?
library(lubridate)
library(dplyr)
some_date = ymd_hm("2020-06-01 17:45", tz = "America/New_York")
if_else(TRUE, force_tz(some_date, tz = "GMT"), force_tz(some_date, tz = "Singapore"))
[1] "2020-06-01 17:45:00 GMT"
if_else(FALSE, force_tz(some_date, tz = "GMT"), force_tz(some_date, tz = "Singapore"))
[1] "2020-06-01 09:45:00 GMT"
I would expect the same outcome as running force_tz alone:
# if TRUE
force_tz(some_date, tz = "GMT")
[1] "2020-06-01 17:45:00 GMT"
# if FALSE
force_tz(some_date, tz = "Singapore")
[1] "2020-06-01 17:45:00 +08"
Thanks!
The culprit is in how dplyr::if_else
is making the adjustments.
First, my original comment about vectors and TZ still stands, and is still at the heart of this problem. For the record:
When you're dealing with
POSIXt
in a vector, the TZ is an attribute of the whole vector, not each independent element. This means that either (a) you must accept that all timestamps within a vector will have the same TZ; or (b) you need to adapt your process to deal with alist
of timestamps, in which case each time can have its own TZ.
If you look at if_else
:
function (condition, true, false, missing = NULL)
{
if (!is.logical(condition)) {
bad_args("condition", "must be a logical vector, not {friendly_type_of(condition)}")
}
out <- true[rep(NA_integer_, length(condition))]
which prepopulates the out
vector with NA
variants of the first ("true") vector. (This is necessary because R really has at least 6 types of NA
: logical (NA
), integer (NA_integer_
), real/float (NA_real_
), string (NA_character_
), date (c.Date(NA)
), and time (c.POSIXct(NA)
); so how one forms a vector of NA
is important.) Once the vector of NA
s is prepopulated, though, realize that this is being based on the first vector, so its attributes are brought into the out
vector.
Sys.time()
# [1] "2020-06-01 09:02:06 PDT"
now <- Sys.time()
attr(now, "tzone") <- "GMT"
dput(now)
# structure(1591027335.41804, class = c("POSIXct", "POSIXt"), tzone = "GMT")
dput(now[NA])
# structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "GMT")
(see how tzone=
is still the same). This means that the output vector (when operating on POSIXt
vectors)will always carry forward the TZ of the
trueargument to
if_else`.
From here, if_else
works in replacement (using its internal replace_with
, which effectively just does out[condition] <- false[condition]
). Replacement does not affect the TZ; in fact, the numeric equivalents of the false
times are assimilated without regard to their TZ. Granted, the "absolute time in the world" for the false
vector is preserved.
The only workaround is to change your workflow to deal with a list
of POSIXt
instead of a vector. if_else
still works there.
now
# [1] "2020-06-01 16:02:15 GMT"
now1 <- list(now, now+1) ; now2 <- list(now+86400, now+86401)
now1
# [[1]]
# [1] "2020-06-01 16:02:15 GMT"
# [[2]]
# [1] "2020-06-01 16:02:16 GMT"
now2
# [[1]]
# [1] "2020-06-02 16:02:15 GMT"
# [[2]]
# [1] "2020-06-02 16:02:16 GMT"
attr(now1[[2]], "tzone") <- "Singapore"
attr(now2[[2]], "tzone") <- "US/Pacific"
now1
# [[1]]
# [1] "2020-06-01 16:02:15 GMT"
# [[2]]
# [1] "2020-06-02 00:02:16 +08"
now2
# [[1]]
# [1] "2020-06-02 16:02:15 GMT"
# [[2]]
# [1] "2020-06-02 09:02:16 PDT"
if_else(c(TRUE, FALSE), now1, now2)
# [[1]]
# [1] "2020-06-01 16:02:15 GMT"
# [[2]]
# [1] "2020-06-02 09:02:16 PDT"