Search code examples
apache-sparkhadoopgoogle-cloud-dataproc

Spark thrift server use only 2 cores


Google dataproc one node cluster, VCores Total = 8. I've tried from user spark:

/usr/lib/spark/sbin/start-thriftserver.sh --num-executors 2 --executor-cores 4

tried to change /usr/lib/spark/conf/spark-defaults.conf

tried to execute

   export SPARK_WORKER_INSTANCES=6
   export SPARK_WORKER_CORES=8

before start-thriftserver.sh

No success. In yarn UI I can see that thrift app use only 2 cores and 6 cores available.

UPDATE1: environment tab at spark ui:

spark.submit.deployMode client
spark.master    yarn
spark.dynamicAllocation.minExecutors    6
spark.dynamicAllocation.maxExecutors    10000
spark.executor.cores    4
spark.executor.instances    1

yarn ui spark ui


Solution

  • It depends on what yarn mode is that app in. Can be yarn client - 1 core for Application Master(the app will be running on the machine where you ran command start-thriftserver.sh). In case of yarn cluster - Driver will be inside AM container, so you can tweak cores with spark.driver.cores. Other cores will be used by executors (1 executor = 1 core by default) Beware that --num-executors 2 --executor-cores 4 wouldn't work as you have 8 cores max and +1 will be needed for AM container (total of 9) You can check cores usage from Spark UI - http://sparkhistoryserverip:18080/history/application_1534847473069_0001/executors/

    Options below are only for Spark standalone mode:

    export SPARK_WORKER_INSTANCES=6
    export SPARK_WORKER_CORES=8
    

    Please review all configs here - Spark Configuration (latest)

    In your case you can edit spark-defaults.conf and add:

    spark.executor.cores 3
    spark.executor.instances 2
    

    Or use local[8] mode as you have only one node anyway.