Do I need a local version of Spark when connecting to another spark cluster through sparklyr?

I have a production R cluster with Rstudio installed. Users are load-balanced onto an R server and write code there. I also have a separate Spark cluster which has 4 nodes. Using sparklyr I can easily connect to my spark cluster via:

sc <- sparklyr::spark_connect("spark://<my cluster>:7077")

The only thing is that I notice is that there is some Spark application usage on the R production server when I do this. I believe this is causing some issues. I have Spark installed on both R production servers and Spark cluster at the same SPARK_HOME location of /var/lib/Spark.

I would like to avoid having Spark on my R servers completely so that there is no usage related to Spark there. How do I do that with sparklyr?

Solution

Yes, you do need local Spark installation to submit Spark applications. The rest depends on the mode:

In the client mode driver will run on the same node from which you submit application.
In the cluster mode, driver will run on the cluster. There will be no local Spark process. This however doesn't support interactive processing.