I have a Scala project with some intensive arithmetics, and it sometimes allocates Floats faster than the GC can clean them up. (This is not about memory leaks caused by retained references, just fast memory consumption for temporary values.) I try to use Arrays with primitive types, and reuse them when I can, but still some new allocations sneak in.
One piece that puzzles me, for instance:
import org.specs2.mutable.Specification
class CalcTest extends Specification {
def dot(a: Array[Float], b: Array[Float]): Float = {
require(a.length == b.length, "array size mismatch")
val n = a.length
var sum: Float = 0f
var i = 0
while (i < n) {
sum += a(i) * b(i)
i += 1
}
sum
}
val vector = Array.tabulate(1000)(_.toFloat)
"calculation" should {
"use memory sparingly" >> {
val before = Runtime.getRuntime().freeMemory()
for (i <- 0 to 1000000)
dot(vector, vector)
val after = Runtime.getRuntime().freeMemory()
(before - after) must be_<(1000L) // actual result above 4M
}
}
}
I would expect it to compute the dot products using only stack memory, but apparently it allocates about 4 bytes per call on the heap. This may not sound like much, but it adds up quickly in my code.
I was suspecting the sum, but from the bytecode output, looks like it is on the stack:
aload 1
arraylength
istore 3
fconst_0
fstore 4
iconst_0
istore 5
l2
iload 5
iload 3
if_icmpge l3
fload 4
aload 1
iload 5
faload
aload 2
iload 5
faload
fmul
fadd
fstore 4
iload 5
iconst_1
iadd
istore 5
_goto l2
l3
fload 4
freturn
Is it the return value that goes on the heap? Is there any way to avoid this overhead entirely? Is there a better way to investigate and solve such memory problems?
From the visualVM output for my project, I only see that i have an awful lot of Floats allocated. It is hard to track there small objects like that, being allocated rapidly. It is more useful for large objects and memory snapshots taken at long intervals.
Update:
I was so focused on the function code, I missed the problem in the test. If I rewrite it with a while loop, it succeeds:
var i = 0
while (i < 1000000) {
dot(vector, vector)
i += 1
}
I would still appreciate more ideas for other ways to debug this sort of issues, in addition to tests like this and using visualVM memory snapshots.
Range implementation in
for (i <- 0 to 1000000)
dot(vector, vector)
might use some memory, or just be slow enough to let JVM allocate something else in background and break the fragile measurement method used in the test.
Try to modify these lines into a while loop, for example.
(The original version of this post said that for() was equivalent to map(), which was wrong. It is equivalent to foreach() here because it does not have a yield clause.)