I have a production R cluster with Rstudio installed. Users are load-balanced onto an R server and write code there. I also have a separate Spark cluster which has 4 nodes. Using sparklyr I can easily connect to my spark cluster via:
sc <- sparklyr::spark_connect("spark://<my cluster>:7077")
The only thing is that I notice is that there is some Spark application usage on the R production server when I do this. I believe this is causing some issues. I have Spark installed on both R production servers and Spark cluster at the same SPARK_HOME
location of /var/lib/Spark
.
I would like to avoid having Spark on my R servers completely so that there is no usage related to Spark there. How do I do that with sparklyr
?
Yes, you do need local Spark installation to submit Spark applications. The rest depends on the mode: