When configuring a spark job, I have sometimes seen people suggest that the number of cores per executor be greater than the total number of cores divided by the number of executors.
Notably, in this example the following is suggested by @0x0FFF:
--num-executors 4 --executor-memory 12g --executor-cores 4
If we compute the total number of executor cores we get 4 cores per executor
* 4 executors total
= 16 cores total
.
However, in the beginning of the question it says "I have one NameNode and two DataNode with 30GB of RAM each, 4 cores each". So, total number of cores is 2 worker nodes
* 4 cores each
= 8 cores
.
Is it possible to have 16 cores
utilized by 4 executors
with this hardware? If so, how?
So, as I wrote in a comment, Spark will spin one thread per core, and I know that for YARN you cannot assign more cores than are available to an executor. If you do, it simply won't launch those executors. This is also described in more detail in this blog post from Cloudera.