Search code examples
scalaapache-sparkapache-spark-mllibword2vec

Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323


I am running word2vec in spark and when it comes to fit(), only one task is observed in UI as in image:

enter image description here.

As per the configuration, num-executors = 1000, executor-cores = 2. And the RDD coalesces to 2000 partitions. It takes quite a long time for mapPartitionsWithIndex. Can it be distributed to multiple executors or tasks?


Solution

  • setNumPartitions(numPartitions: Int) solves my problem. I did not check the default value. Sets number of partitions (default: 1).