Search code examples
sparkr

Where is the complete documentation of the SparkR GroupedData object?


Many things can be done with the groupBy function in SparkR.

Here's an example from the documentation:

# Compute the average for all numeric columns grouped by department.

avg(groupBy(df, "department"))

But I'm very curious about the "GroupedData" object generated by the groupBy function, which is mentioned in the groupBy documentation and also has it's own page.

According to that documentation, the following code will generate a GroupedData object:

groupBy(df, "department")

Unfortunately the page for the "GroupedData" object appears incomplete or I don't know how to find the object documentation for it. The doc says it's "A Java object reference to the backing Scala GroupedData"- I tried searching the Scala documentation on spark.apache.org and found nothing there.

I'm looking for a list of the "GroupedData" class members and methods similar to documentation I have found for other programming languages. Depending on what I find I may have some novel ways to use this object for analysis I am doing in SparkR. Also, the answer to this question will help me with many similar questions I have about finding documentation of other SparkR objects.


Solution

  • You can alwys refer to the PySpark documentation if you think SparkR documentation is not adequate, at least I do that :), there are many common APIs