Search code examples
rapache-sparksparkr

How do I use mutate on DataFrame in SparkR?


I am trying to use this method to explode a field in a dataframe using SparkR. My code is:

Sys.setenv(SPARK_HOME="/usr/hdp/2.6.0.3-8/spark")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
hc <- sparkRHive.init(sc)
df <- sql(hc, "SELECT * FROM tweetsorc5")
library(tidyverse)
dat <- df %>%   mutate(a=explode(df$user)) %>%  select("created_at", "a.utc_offset")

but i get the error:

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "DataFrame"

I cannot find any help for this.


Solution

  • It is not possible. SparkDataFrame doesn't implement the same interface as data.frame.

    If you want to use dplyr with Spark you should use sparklyr, not SparkR.

    With SparkR use SparkR::withColumn:

    withColumn(df, "a", explode(df$user))
    

    or SparkR::mutate.