Search code examples
scalamemory-managementmemory-leaksprofiler

Dot product in Scala without heap allocation


I have a Scala project with some intensive arithmetics, and it sometimes allocates Floats faster than the GC can clean them up. (This is not about memory leaks caused by retained references, just fast memory consumption for temporary values.) I try to use Arrays with primitive types, and reuse them when I can, but still some new allocations sneak in.

One piece that puzzles me, for instance:

import org.specs2.mutable.Specification

class CalcTest extends Specification {

  def dot(a: Array[Float], b: Array[Float]): Float = {
    require(a.length == b.length, "array size mismatch")
    val n = a.length
    var sum: Float = 0f
    var i = 0
    while (i < n) {
      sum += a(i) * b(i)
      i += 1
    }
    sum
  }

  val vector = Array.tabulate(1000)(_.toFloat)

  "calculation" should {
    "use memory sparingly" >> {
      val before = Runtime.getRuntime().freeMemory()

      for (i <- 0 to 1000000)
        dot(vector, vector)

      val after = Runtime.getRuntime().freeMemory()
      (before - after) must be_<(1000L)  // actual result above 4M
    }
  }
}

I would expect it to compute the dot products using only stack memory, but apparently it allocates about 4 bytes per call on the heap. This may not sound like much, but it adds up quickly in my code.

I was suspecting the sum, but from the bytecode output, looks like it is on the stack:

    aload 1
    arraylength
    istore 3
    fconst_0
    fstore 4
    iconst_0
    istore 5
   l2
    iload 5
    iload 3
    if_icmpge l3
    fload 4
    aload 1
    iload 5
    faload
    aload 2
    iload 5
    faload
    fmul
    fadd
    fstore 4
    iload 5
    iconst_1
    iadd
    istore 5
    _goto l2
   l3
    fload 4
    freturn

Is it the return value that goes on the heap? Is there any way to avoid this overhead entirely? Is there a better way to investigate and solve such memory problems?

From the visualVM output for my project, I only see that i have an awful lot of Floats allocated. It is hard to track there small objects like that, being allocated rapidly. It is more useful for large objects and memory snapshots taken at long intervals.

Update:

I was so focused on the function code, I missed the problem in the test. If I rewrite it with a while loop, it succeeds:

var i = 0
while (i < 1000000) {
  dot(vector, vector)
  i += 1
}

I would still appreciate more ideas for other ways to debug this sort of issues, in addition to tests like this and using visualVM memory snapshots.


Solution

  • Range implementation in

    for (i <- 0 to 1000000)
      dot(vector, vector)
    

    might use some memory, or just be slow enough to let JVM allocate something else in background and break the fragile measurement method used in the test.

    Try to modify these lines into a while loop, for example.

    (The original version of this post said that for() was equivalent to map(), which was wrong. It is equivalent to foreach() here because it does not have a yield clause.)