Search code examples
rapache-sparksparkr

Ungroup SparkR data frame


I have a spark data frame:

library(SparkR); library(magrittr)

as.DataFrame(mtcars) %>%
   groupBy("am")

How can I ungroup this data frame? There doesn't seems to be any ungroup function in the SparkR library!


Solution

  • There doesn't seems to be any ungroup function in the SparkR library

    That's because groupBy doesn't have the same meaning as group_by in dplyr.

    SparkR::group_by / SparkR::groupBy returns not a SparkDataFrame but a GroupData object which corresponds to GROUP BY clause in SQL. To convert it back to a SparkDataFrame you should call SparkR::agg (or if you prefer dplyr nomenclature SparkR::summarize) which corresponds to SELECT component of the SQL query.

    Once you aggregate you get back SparkDataFrame and grouping is no longer present.

    Additionally SparkR::groupBy doesn't have dplyr group_by(...) %>% mutate(...) equivalent. Instead we use window functions with frame definition.

    So the take away message is - if you don't plan to aggregate don't use groupBy.