Search code examples
rsparkr

Find Mean of all the numeric variables of a Spark dataframe in R


I have a Spark Dataframe with the below structure present in R :-

Var1-----    Var 2-----   Var 3 -------      Var 4-----        Group  
98.64----   32.35----   11906.91--  08.65-----   A  
94.83----   29.36----   17287.57--  06.01-----   B  
99.94----   35.36----   30411.85--  08.82-----   C  
99.45----   34.58----   18267.26--  10.09-----   C  
99.93----   36.64----   23560.04--  07.34-----   A  
99.66----   48.81----   42076.44--  08.44-----   B  
99.96----   27.38----   18474.01--  11.39-----   A  
97.49----   25.28----   14615.50--  06.60-----   B  
98.98----   32.50----   10282.90--  07.71-----   C  
99.57----   31.54----   12725.56--  06.17-----   C  
99.91----   26.46----   10990.13--  06.17-----   C  

This is my representative dataset, number of records are pretty huge. Similarly number of columns are more than 200 as well.

Can someone please help me with the following result set. For a local dataframe in R, doing this using DPLYR is very easy. But working on Spark Dataframe seems

Group   Average_Var1    Average_Var2    Average_Var3    Average_Var4  
A   -----    99.51  ------------    32.13   ----------    17980.34  -----    9.13  
B   -----    97.32  ------------    34.42   ----------    24659.83  -----    6.89  
C   -----    99.57  ------------    32.10   ----------    16535.54  -----    7.78  

Solution

  • Using sparklyr try this:

    df%>% group_by(Group)%>% summarize_all(.funs = mean)