When i run this sample application from the spark shell, i see that on the UI that there is an executor with 8 tasks. Why 8 tasks are required for such a small data set?
Please note that i am running on a standalone local cluster with 8 cores.
val data = Array(1,2,3,4)
val distData = sc.parallelize(data)
distData.collect()
The default partitions is equal to the max cores. You can pass in a second parameter overriding the number of partitions.