Search code examples
apache-sparkcluster-computing

Role of master in Spark standalone cluster


In a Spark standalone cluster, what is exactly the role of the master (a node started with start_master.sh script)?

I understand that is the node that receives the jobs from the submit-job.sh script, but what is its role when processing a job?

I'm seeing in the web UI that always delivers the job to a slave (a node started with start_slave.sh) and is not participating from processing, Am I right? In that case, should I also run also the script start_slave.sh in the same machine than master to to take advantage of its resources (cpu and memory)?

Thanks in advance.


Solution

  • Spark runs in the following cluster modes:

    • Local
    • Standalone
    • Mesos
    • Yarn

    The above are cluster modes which offer resources to Spark Applications

    Spark standalone mode is master slave architecture, we have Spark Master and Spark Workers. Spark Master runs in one of the cluster nodes and Spark Workers run on the Slave nodes of the cluster.

    Spark Master (often written standalone Master) is the resource manager for the Spark Standalone cluster to allocate the resources (CPU, Memory, Disk etc...) among the Spark applications. The resources are used to run the Spark Driver and Executors.

    Spark Workers report to Spark Master about resources information on the Slave nodes.

    [apache-spark]