I am trying to use this method to explode a field in a dataframe using SparkR. My code is:
Sys.setenv(SPARK_HOME="/usr/hdp/2.6.0.3-8/spark")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
hc <- sparkRHive.init(sc)
df <- sql(hc, "SELECT * FROM tweetsorc5")
library(tidyverse)
dat <- df %>% mutate(a=explode(df$user)) %>% select("created_at", "a.utc_offset")
but i get the error:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "DataFrame"
I cannot find any help for this.
It is not possible. SparkDataFrame
doesn't implement the same interface as data.frame
.
If you want to use dplyr
with Spark you should use sparklyr
, not SparkR.
With SparkR use SparkR::withColumn
:
withColumn(df, "a", explode(df$user))
or SparkR::mutate
.