Search code examples
apache-sparkpysparkdatabricksspark-ui

Spark local mode: How to query the number of executor slots?


I'm following tutorial Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data where it's claimed that the "local mode" Spark cluster available in Databricks "Community Edition" provides you with 3 executor slots. (So 3 Tasks should be able to run concurrently.)

However, when I look at the "Event Timeline" visualization for job stages with multiple tasks in my own notebook on Databricks "Community Edition", it looks like up to 8 tasks were running concurrently:

Event timeline in Spark UI, with up to 8 tasks executing concurrently

Is there a way to query the number of executor slots from PySpark or from a Databricks notebook? Or can I directly see the number in the Spark UI somewhere?


Solution

  • Databricks "slots" = Spark "cores" = available threads

    "Slots" is a term Databricks uses (or used?) for the threads available to do parallel work for Spark. The Spark documentation and Spark UI calls the same concept "cores", even though they are unrelated to physical CPU cores.

    (See this answer on Hortonworks community, and this "Spark Tutorial: Learning Apache Spark" databricks notebook.)

    View number of slots/cores/threads in Spark UI (on Databricks)

    To see how many there are in your Databricks cluster, click "Clusters" in the navigation area to the left, then hover over the entry for your cluster and click the "Spark UI" link. In the Spark UI, click the "Executors" tab.

    Annotated Screenshot: How to open Spark UI for a Databricks cluster

    You can see the number of executor cores (=executor slots) in both the summary and for each individual executor1 in the "cores" column of the respective table there:

    Spark UI for Executors: Summary table and tables for each executor (only one executor here)

    1There's only one executor in "local mode" clusters, which are the cluster available in Databricks community edition.

    Query number of slots/cores/threads

    How to query this number from within a notebook, I'm not sure.

    spark.conf.get('spark.executor.cores')
    

    results in java.util.NoSuchElementException: spark.executor.cores