Search code examples
apache-sparkubuntupysparkvirtualboxcpu-cores

How to check how many cores PySpark uses?


I have installed VirtualBox(Ubuntu 18.04.2 64-bit) and PySpark 2.4.0. When I created a VB I put 4 CPUs to be max.

How am I supposed to check how many cores Spark is using?


Solution

  • That depends on the master URL that describes what runtime environment (cluster manager) to use.

    Since this is such a low-level infrastructure-oriented thing you can find the answer by querying a SparkContext instance.

    E.g. if it's local[*] that would mean that you want to use as many CPUs (the star part) as are available on the local JVM.

    $ ./bin/pyspark
    Python 2.7.15 (default, Feb 19 2019, 09:17:37)
    [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
    ...
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
          /_/
    
    Using Python version 2.7.15 (default, Feb 19 2019 09:17:37)
    SparkSession available as 'spark'.
    >>> print sc.master
    local[*]
    >>> print sc.defaultParallelism
    8