scala apache-spark parallel-processing spark-streaming starvation

What is Starvation scenario in Spark streaming?

In the famous word count example for spark streaming, the spark configuration object is initialized as follows:

/* Create a local StreamingContext with two working thread and batch interval of 1 second.
The master requires 2 cores to prevent from a starvation scenario. */

val sparkConf = new SparkConf().
setMaster("local[2]").setAppName("WordCount")

Here if I change the master from local[2] to local or does not set the Master, I do not get the expected output and in fact word counting doesn't happen at all.

The comment says:

"The master requires 2 cores to prevent from a starvation scenario" that's why they have done setMaster("local[2]").

Can somebody explain me why it requires 2 cores and what is starvation scenario ?

Solution

From the documentation:

[...] note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).

In other words, one thread will be used to run the receiver and at least one more is necessary for processing the received data. For a cluster, the number of allocated cores must be more than the number of receivers, otherwise the system can not process the data.

Hence, when running locally, you need at least 2 threads and when using a cluster at least 2 cores need to be allocated to your system.

Starvation scenario refers to this type of problem, where some threads are not able to execute at all while others make progress.

There are two classical problems where starvation is well known:

Dining philosophers
Readers-writer problem, here it's possible to synchronize the threads so the readers or writers starve. It's also possible to make sure that no starvation occurs.