I'm curious what is the best practice where to specify number of executors?
It seems there are 2 places to do that, one is when it is submitted, the other is when it build SparkSession.
As checked, It seems work with both(I use Spark standalone mode with pyspark, and deploy mode is client mode).
Anyone know what is the right way or anything different?
Thank you!
I've tried to specify number of executor in spark-submit
$ spark-submit --master spark://spark-master:7077 --py-files my_libs.zip my_spark-main.py
My my_spark-main.py is like this
spark = SparkSession.builder \
.appName("Spark-job-on-cluster-example") \
.master("spark://master-node:7077") \
.config("spark.executor.instances", 3) \
.config("spark.eventLog.enabled", True) \
.getOrCreate()
# some code below ...
When using the spark-submit
it will specify the number of executors when submitting the job from the command line.
I think it's good for control and flexibility
The second one just makes your application more self-contained.
If you prefer control over the code, use spark-submit
.
If you want your application to be more self-contained, set it within the SparkSession.builder
.