I have created a data frame out of mtcars
. I do a group by gear
and cyl
. then i calculate max for hp
and disp
. something is going wrong in group by as there should be 8 groups in there. while i get only 6 groups.
library(sparkR)
xx=as.DataFrame(sqlContext, data = mtcars)
head(agg(groupBy(xx, "gear", "cyl"), hp = 'max'))
gear cyl max(hp)
1 3 8 245
2 5 4 113
3 3 4 97
4 4 4 109
5 5 6 175
6 3 6 110
Update 1:
I have another query, in the documentation of groupby
we have an example as:
## Examples
## Not run:
# Compute the average for all numeric columns grouped by department.
avg(groupBy(df, "department"))
# Compute the max age and average salary, grouped by department and gender.
agg(groupBy(df, "department", "gender"), salary="avg", "age" -> "max")
## End(Not run)
similarly for mtcars i came up with
agg(groupBy(xx, "gear", "cyl"), qsec ="avg", "disp" -> "max")
Firstly my understanding is that we get max of disp
, but the code doesn't seem to work. It gives out error as follows. Second thing is that the code work with an =
in place of ->
. So is there a typo or something?
unable to find an inherited method for function ‘groupBy’ for signature ‘"function"’
My SparkR version is SparkR_1.6.1
.
Your aggregation is well, but you are adding a 'head' in first, it will show you just the first 6 lines . You need to replace it by a collect. like this:
df <- as.DataFrame(mtcars)
gp = agg(groupBy(df, df$gear, df$cyl), hp = 'max')
collect(gp)
Just a remark, i am using spark 2.0.2