I have a dataframe that contains the columns start_time
and timezone
(example shown below). The start_time
is recorded in UTC. I want to create a new column called start_time_local
that contains the start time within the local timezone for that record.
I've tried many examples including format()
, force_tz()
, with_tz()
, etc, but most examples I've seen show how to convert ALL timestamps to the same timezone, and not each timestamp to it's respective timezone
+---------------------+---------------------+
| start_time | timezone |
+---------------------+---------------------+
| 2020-07-07 16:01:15 | Europe/Dublin |
| 2020-07-07 21:01:28 | America/Los_Angeles |
| 2020-07-20 12:45:33 | America/New_York |
| 2020-07-24 16:00:32 | America/Los_Angeles |
| 2020-07-09 14:00:39 | Europe/London |
| 2020-07-16 20:30:30 | America/Los_Angeles |
| 2020-07-29 14:03:09 | Europe/London |
| 2020-07-27 20:59:32 | America/Los_Angeles |
| 2020-07-20 16:09:54 | America/Denver |
| 2020-07-21 09:51:04 | Europe/Dublin |
+---------------------+---------------------+
# example data
df <- structure(list(start_time = structure(c(1594162875, 1594180888,
1595274333, 1595631632, 1594328439, 1594956630, 1596056589, 1595908772,
1595286594, 1595350264), class = c("POSIXct", "POSIXt"), tzone = ""),
timezone = c("Europe/Dublin", "America/Los_Angeles", "America/New_York",
"America/Los_Angeles", "Europe/London", "America/Los_Angeles",
"Europe/London", "America/Los_Angeles", "America/Denver",
"Europe/Dublin")), row.names = c(NA, -10L), class = "data.frame")
Unfortunately you cannot have different time zones within a single POSIXct vector, because the timezone is stored as a single atomic attribute that applies to the whole vector. If you try to write multiple timezones to this attribute, the S3 methods for POSIXct will stop working.
If this is something you are very keen to pursue you can write a new S3 class to handle this kind of problem. The very bare bones of such a class would look something like this:
POSIX_multi_tz <- function(UTC_times, time_zones)
{
structure(as.numeric(UTC_times),
class = c("POSIXmulti", "POSIXt"),
tz = time_zones)
}
format.POSIXmulti <- function(x, ...)
{
unlist(mapply(function(a, b) {
format(as.POSIXct(a, origin = "1970-01-01"), tz = b, usetz = TRUE)
}, a = x, b = attr(x, "tz"), SIMPLIFY = FALSE))
}
print.POSIXmulti <- function(x, ...)
{
print(format(x, ...))
}
This would allow for the following behaviour:
df$new_time <- POSIX_multi_tz(df$start_time, df$timezone)
df
#> start_time timezone new_time
#> 1 2020-07-08 00:01:15 Europe/Dublin 2020-07-08 00:01:15 IST
#> 2 2020-07-08 05:01:28 America/Los_Angeles 2020-07-07 21:01:28 PDT
#> 3 2020-07-20 20:45:33 America/New_York 2020-07-20 15:45:33 EDT
#> 4 2020-07-25 00:00:32 America/Los_Angeles 2020-07-24 16:00:32 PDT
#> 5 2020-07-09 22:00:39 Europe/London 2020-07-09 22:00:39 BST
#> 6 2020-07-17 04:30:30 America/Los_Angeles 2020-07-16 20:30:30 PDT
#> 7 2020-07-29 22:03:09 Europe/London 2020-07-29 22:03:09 BST
#> 8 2020-07-28 04:59:32 America/Los_Angeles 2020-07-27 20:59:32 PDT
#> 9 2020-07-21 00:09:54 America/Denver 2020-07-20 17:09:54 MDT
#> 10 2020-07-21 17:51:04 Europe/Dublin 2020-07-21 17:51:04 IST
Beware though - you would still have a bit of work to do to be able to use this class the way you would use POSIXct objects. You can still use arithmetic functions to add and substract seconds, but if you use the lubridate
package or similar, many of the methods will not work for this class unless you define various Ops
to handle adding durations, periods etc.