Search code examples
apache-sparktaskexecutor

Spark - Understanding a simple application on Standalone cluster


When i run this sample application from the spark shell, i see that on the UI that there is an executor with 8 tasks. Why 8 tasks are required for such a small data set?

Please note that i am running on a standalone local cluster with 8 cores.

val data = Array(1,2,3,4)
val distData = sc.parallelize(data)
distData.collect()

Solution

  • The default partitions is equal to the max cores. You can pass in a second parameter overriding the number of partitions.