Search code examples
scalaoptimizationjvmhashtabletemporary-objects

scala speed when using get() method on hash tables? (are temporary Option() objects generated?)


I am converting some code to Scala. It's code that sits in an inner loop with very large amounts of data so it needs to be fast, and it involves looking up keys in a hash table and computing probabilities. It needs to do different things depending on whether a key is found or not. The code would look like this using the "standard" idiom:

counts.get(word) match {
  case None => {
    WordDist.overall_word_probs.get(word) match {
      case None => (unseen_mass*WordDist.globally_unseen_word_prob
                    / WordDist.num_unseen_word_types)
      case Some(owprob) => unseen_mass * owprob / overall_unseen_mass
    }
  }
  case Some(wordcount) => wordcount.toDouble/total_tokens*(1.0 - unseen_mass)
}

but I am concerned that code of this sort is going to be very slow because of all these temporary Some() objects being created and then garbage-collected. The Scala2e book claims that a smart JVM "might" optimize these away so that the code does the right thing efficiency-wise, but does this actually happen using Sun's JVM? Anyone know?


Solution

  • This may happen if you enable escape analysis in the jvm, enabled with:

    -XX:+DoEscapeAnalysis
    

    on JRE 1.6. Essentially, it should detect objects being created which do not escape the method activation frame and either allocate them on the stack or GC them right after they're no longer needed.

    One thing you could do is to micro benchmark your code using the scala.testing.Benchmark trait. Just extend it with a singleton object and implement the run method, compile it and run it. It will run the run method multiple times, and measure execution times.