Search code examples
apache-sparkpyspark

Number of executor in SparkSession or spark-submit?


I'm curious what is the best practice where to specify number of executors?

It seems there are 2 places to do that, one is when it is submitted, the other is when it build SparkSession.

As checked, It seems work with both(I use Spark standalone mode with pyspark, and deploy mode is client mode).

Anyone know what is the right way or anything different?

Thank you!

I've tried to specify number of executor in spark-submit

$ spark-submit --master spark://spark-master:7077 --py-files my_libs.zip my_spark-main.py

My my_spark-main.py is like this

spark = SparkSession.builder \
        .appName("Spark-job-on-cluster-example") \
        .master("spark://master-node:7077") \
        .config("spark.executor.instances", 3) \
        .config("spark.eventLog.enabled", True) \
        .getOrCreate()
# some code below ...

Solution

  • When using the spark-submit it will specify the number of executors when submitting the job from the command line. I think it's good for control and flexibility

    The second one just makes your application more self-contained.

    If you prefer control over the code, use spark-submit. If you want your application to be more self-contained, set it within the SparkSession.builder.