according to the official docs there are two options to create parallel collections:
1)
// There's a little bug here, doesn't matter for the sake of the question
import scala.collection.parallel.mutable.ParArray
val pv = new ParVector[Int]
2)
val pv = Vector(1,2,3,4,5,6,7,8,9).par
Now, what are the differences? Does exist any performance penalty when I convert it from a simple sequential collection?
What would you do if you've to create a bit parallel collection (say, several thousand elements), would you create it from scratch or convert it?
Thank you guys!
EDIT:
As @oxbow_lakes says there's a piece of docs that focus on this topic, but i'm trying to get "experienced advices". I mean, what would YOU do if you have to read a big collection from a DB, for instance.
Depends on the collection. Vector
is basically free, ParVector
is just a wrapper around the vector. Same for Arrays
. Others, e.g. List
, will have to be completely copied in a different structure, more amenable to parallelism. And then copied back to a new list if you want your result to be a List too.
You may have a look at this brand new guide on the scala documentation site, section Creating a parallel collection.