I am new to SparkR, learning as I go, and on version 2.2.0 - SNAPSHOT. I have a SparkDataFrame with one column of UTC timestamps, one column of time zones. For example:
mydates<- c("2012-12-22 14:23:05", "2015-01-19 16:37:42", "2017-03-02 08:22:55")
mytimeZones<- c("America/Chicago", "America/New_York", "Pacific/Galapagos")
dfr<- data.frame(date=mydates, timezone=mytimeZones)
dfs<- SparkR::createDataFrame(dfr)
What I would like to do is make a column of local times for each observation in my dataset. I saw that the from_utc_timestamp()
function takes in a column of timestamps and a string representing the desired timezone conversion. My problem is that I have columns of both. Given this, I am wondering what the workflow would need to be, and if someone with more experience with SparkR might be willing to shed insight into how I can tackle this problem? At the moment, I have tried things like this:
dftest<- withColumn(dfs, "LocalTime",
SparkR:::foreach(dfs, function(x){from_utc_timestamp(y=x$date, x=x$timezone)})
)
But with no success. I'm hoping someone out there has tackled this problem in SparkR and could help me. Thanks for your time. Kind Regards, Nate
You can use either selectExpr
:
selectExpr(dfs, "*", "from_utc_timestamp(date, timezone) AS LocalTime")
or expr
:
withColumn(dfs, "LocalTime", expr("from_utc_timestamp(date, timezone)"))
In a simpler case, where timezone is a constant string calling from_utc_timestamp
directly would do the trick:
withColumn(dfs, "LocalTime",
from_utc_timestamp(column("date"), "America/New_York")
Unfortunately there is no from_utc_timestamp
with signature (Column, Column) => Column
, hence the SQL-ish solution.