Confusion using Yarn Resource Manager

I am trying to run a simple pyspark job in Amazon AWS and it is configured to use Yarn via spark-default.conf file. I am slightly confused about the Yarn deployment code.

I see some example code as below:

conf = SparkConf()
conf.setMaster('yarn-client')
conf.setAppName('spark-yarn')
sc = SparkContext(conf=conf)

And I am not sure how I should execute the spark job in this case when 'yarn-client' is specified. I usually do it as follows:

$spark-submit --deploy-mode client spark-job.py

But what is the difference between

$spark-submit --deploy-mode client spark-job.py

and

$spark-submit spark-job.py

How do I identify looking at spark logs whether the job ran in client mode or cluster or yarn-client?

Solution

The default --deploy-mode is client. So both the below spark-submit will run in client mode.

$spark-submit --deploy-mode client spark-job.py

and

$spark-submit spark-job.py

If you specify --master yarn, now it will run in yarn in client mode.

Note: --master The master URL for the cluster (e.g. for standalone cluster spark://23.195.26.187:7077) Types of mode *standalone *YARN *Mesos *Kubernetes

--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) *client *cluster