I have a data frame with data like this:
+----------------------+------------+
| utc_timestamp | tz_locale |
+----------------------+------------+
|2021-07-16T10:00:00Z | US/Eastern |
|2021-07-19T15:00:00Z | US/Central |
+----------------------+------------+
I want to convert the timestamps from UTC (TZ 0) to the local time based on the value in the tz_locale
column:
+--------------------------+------------+
| utc_timestamp | tz_locale |
+--------------------------+------------+
|2021-07-16T06:00:00-04:00 | US/Eastern |
|2020-12-19T09:00:00-06:00 | US/Central |
+--------------------------+------------+
I tried writing it like this:
val new_df = df.withColumn("utc_timestamp", from_utc_timestamp(df.col("utc_timestamp"), df.col("tz_locale")))
It appears from_utc_timestamp
wants a String constant for the second argument, so it apparently only works to convert the entire column to the same timezone. But I need to convert each row dynamically based on the value of another column in that row.
I think this is possible in newer versions of Spark (from_utc_timestamp
is overloaded with a version that takes (DataFrame.col, DataFrame.col)
), but I am on 2.3 and upgrading is not an option. How can this be done in Spark 2.3? It seems like a fairly common task but I can't figure it out, and couldn't find anything using search.
For Spark 2.3 or older, you can take advantage of using the less type-constrained SQL expression via expr
:
df.withColumn("utc_timestamp", expr("from_utc_timestamp(utc_timestamp, tz_locale)")).show
+-------------------+----------+
| utc_timestamp| tz_locale|
+-------------------+----------+
|2021-07-15 23:00:00|US/Eastern|
|2021-07-19 03:00:00|US/Central|
+-------------------+----------+