Search code examples
rdatetimelubridateutc

How to convert a column of UTC timestamps into several different timezones?


I have a dataset with dates stored in the DB as UTC, however, the timezone is actually different.

mydat <- data.frame(
  time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
  timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)

I want to apply the timezone to the UTC saved timestamps, over the entire column.

I looked into the with_tz function in the lubridate package, but I don't see how to reference the "timezone" column, rather than hardcoding in a value.

Such as if I try

with_tz(mydat$time_stamp, tzone = mydat$timezone)

I get the following error

Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`

However, if I try

mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")

that will render a new column without issue. How can I do this just referencing column values?


Solution

  • The following should do what you ask for:

    Code

    mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC", 
                                     "2022-08-01 17:00:00 UTC",
                                     "2022-08-02 22:30:00 UTC",
                                     "2022-08-04 05:00:00 UTC", 
                                     "2022-08-05 02:00:00 UTC"),
                        timezone=c("America/Chicago", "America/New_York", 
                                   "America/Los_Angeles", "America/Denver",
                                   "America/New_York"))
    
    mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
    mydat$format <- ""
    for (i in seq_len(nrow(mydat)))
        mydat[i, "format"] <- strftime(mydat[i,"utc"], 
                                       "%Y-%m-%d %H:%M:%S",
                                       tz=mydat[i,"timezone"])
    

    Output

    > mydat
                   time_stamp            timezone                 utc              format
    1 2022-08-01 05:00:00 UTC     America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
    2 2022-08-01 17:00:00 UTC    America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
    3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
    4 2022-08-04 05:00:00 UTC      America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
    5 2022-08-05 02:00:00 UTC    America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
    > 
    

    Comment

    We first parse your data as UTC, I once wrote a helper function for that in my anytime package (there are alternatives but this is how I do it...). We then need to format from the given (numeric !!) UTC representation to the give timezone. We need a loop for this as the tz argument to strftime() is not vectorized.