memory parallel-processing apache-spark pyspark mesos

Can executors share cores in spark?

When configuring a spark job, I have sometimes seen people suggest that the number of cores per executor be greater than the total number of cores divided by the number of executors.

Notably, in this example the following is suggested by @0x0FFF:

--num-executors 4 --executor-memory 12g --executor-cores 4

If we compute the total number of executor cores we get 4 cores per executor * 4 executors total = 16 cores total.
However, in the beginning of the question it says "I have one NameNode and two DataNode with 30GB of RAM each, 4 cores each". So, total number of cores is 2 worker nodes * 4 cores each = 8 cores.

Is it possible to have 16 cores utilized by 4 executors with this hardware? If so, how?

Solution

So, as I wrote in a comment, Spark will spin one thread per core, and I know that for YARN you cannot assign more cores than are available to an executor. If you do, it simply won't launch those executors. This is also described in more detail in this blog post from Cloudera.