I have a Spark Dataframe with the below structure present in R :-
Var1----- Var 2----- Var 3 ------- Var 4----- Group
98.64---- 32.35---- 11906.91-- 08.65----- A
94.83---- 29.36---- 17287.57-- 06.01----- B
99.94---- 35.36---- 30411.85-- 08.82----- C
99.45---- 34.58---- 18267.26-- 10.09----- C
99.93---- 36.64---- 23560.04-- 07.34----- A
99.66---- 48.81---- 42076.44-- 08.44----- B
99.96---- 27.38---- 18474.01-- 11.39----- A
97.49---- 25.28---- 14615.50-- 06.60----- B
98.98---- 32.50---- 10282.90-- 07.71----- C
99.57---- 31.54---- 12725.56-- 06.17----- C
99.91---- 26.46---- 10990.13-- 06.17----- C
This is my representative dataset, number of records are pretty huge. Similarly number of columns are more than 200 as well.
Can someone please help me with the following result set. For a local dataframe in R, doing this using DPLYR is very easy. But working on Spark Dataframe seems
Group Average_Var1 Average_Var2 Average_Var3 Average_Var4
A ----- 99.51 ------------ 32.13 ---------- 17980.34 ----- 9.13
B ----- 97.32 ------------ 34.42 ---------- 24659.83 ----- 6.89
C ----- 99.57 ------------ 32.10 ---------- 16535.54 ----- 7.78
Using sparklyr
try this:
df%>% group_by(Group)%>% summarize_all(.funs = mean)