Search code examples
scalacollectionsparallel-processingparallel-collections

What's the cost of converting a sequential collection into a parallel one, against creating it from scratch


according to the official docs there are two options to create parallel collections:

1)

// There's a little bug here, doesn't matter for the sake of the question
import scala.collection.parallel.mutable.ParArray
val pv = new ParVector[Int]

2)

val pv = Vector(1,2,3,4,5,6,7,8,9).par

Now, what are the differences? Does exist any performance penalty when I convert it from a simple sequential collection?

What would you do if you've to create a bit parallel collection (say, several thousand elements), would you create it from scratch or convert it?

Thank you guys!

EDIT:

As @oxbow_lakes says there's a piece of docs that focus on this topic, but i'm trying to get "experienced advices". I mean, what would YOU do if you have to read a big collection from a DB, for instance.


Solution

  • Depends on the collection. Vector is basically free, ParVector is just a wrapper around the vector. Same for Arrays. Others, e.g. List, will have to be completely copied in a different structure, more amenable to parallelism. And then copied back to a new list if you want your result to be a List too.

    You may have a look at this brand new guide on the scala documentation site, section Creating a parallel collection.