performance scala collections scala-collections processing-efficiency

Performance Difference Using Update Operation on a Mutable Map in Scala with a Large Size Data

I would like to know if an update operation on a mutable map is better in performance than reassignment.

Lets assume I have the following Map

val m=Map(1 -> Set("apple", "banana"),
          2 -> Set("banana", "cabbage"),
          3 -> Set("cabbage", "dumplings"))

which I would like to reverse into this map:

 Map("apple" -> Set(1),
     "banana" -> Set(1, 2),
     "cabbage" -> Set(2, 3),
     "dumplings" -> Set(3))

The code to do so is:

def reverse(m:Map[Int,Set[String]])={
  var rm = Map[String,Set[Int]]()
  m.keySet foreach { k=>
       m(k) foreach { e =>
         rm = rm + (e -> (rm.getOrElse(e, Set()) + k))
       }
  }
  rm
}

Would it be more efficient to use the update operator on a map if it is very large in size?

The code using the update on map is as follows:

def reverse(m:Map[Int,Set[String]])={
  var rm = scala.collection.mutable.Map[String,Set[Int]]()
  m.keySet foreach { k=>
      m(k) foreach { e =>
         rm.update(e,(rm.getOrElse(e, Set()) + k))                                                        
      }
  }
  rm
}

Solution

I ran some tests using Rex Kerr's Thyme utility.

First I created some test data.

val rndm = new util.Random
val dna = Seq('A','C','G','T')
val m = (1 to 4000).map(_ -> Set(rndm.shuffle(dna).mkString
                                ,rndm.shuffle(dna).mkString)).toMap

Then I timed some runs with both the immutable.Map and mutable.Map versions. Here's an example result:

Time:    2.417 ms   95% CI 2.337 ms - 2.498 ms   (n=19)  // immutable
Time:    1.618 ms   95% CI 1.579 ms - 1.657 ms   (n=19)  // mutable
Time     2.278 ms   95% CI 2.238 ms - 2.319 ms   (n=19)  // functional version

As you can see, using a mutable Map with update() has a significant performance advantage.

Just for fun I also compared these results with a more functional version of a Map reverse (or what I call a Map inverter). No var or any mutable type involved.

m.flatten{case(k, vs) => vs.map((_, k))}
 .groupBy(_._1)
 .mapValues(_.map(_._2).toSet)

This version consistently beat your immutable version but still doesn't come close to the mutable timings.