Where is the RpcEnv
instance in, and how every components get the corresponding rpcEnv
instance? How do the components make connection to each other?
RpcEnv
is an RPC Environment that is created separately for every component in Spark and is used to exchange messages between each other for remote communication.
Spark creates the RPC environments for the driver and executors (by executing SparkEnv. createDriverEnv and SparkEnv.createExecutorEnv methods, respectively).
SparkEnv.createDriverEnv
is used exclusively when SparkContext
is created for the driver:
_env = createSparkEnv(_conf, isLocal, listenerBus)
You can create a RPC Environment using RpcEnv.create
factory methods yourself (as do ExecutorBackends, e.g. CoarseGrainedExecutorBackend):
val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
Separate RpcEnv
s are also created for standalone Master and workers.
How do the components make connection to each other?
No much magic here :) The driver for a Spark application and the standalone Master for a Spark Standalone cluster are created first and they have no dependency on other components.
When the driver of a Spark application starts, it requests resources (in the form of resource containers from a cluster manager) with the command to launch executors (that differs per cluster manager). In the launch command, there are connection details (i.e. host and port) of the driver's RpcEndpoint.
See how it works with Hadoop YARN in Client.
It is a similar process with standalone Workers with the difference that the administrator has to specify the master's URL at command line.
$ ./sbin/start-slave.sh
Usage: ./sbin/start-slave.sh [options] <master>
Master must be a URL of the form spark://hostname:port
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.