For comprehension and number of function creation

Recently I had an interview for Scala Developer position. I was asked such question

// matrix 100x100 (content unimportant)

val matrix = Seq.tabulate(100, 100) { case (x, y) => x + y }

// A

for {

   row <- matrix

   elem <- row

} print(elem)

// B

val func = print _
for {

   row <- matrix

   elem <- row

} func(elem)

and the question was: Which implementation, A or B, is more efficent?

We all know that for comprehensions can be translated to

// A

matrix.foreach(row => row.foreach(elem => print(elem)))

// B

matrix.foreach(row => row.foreach(func))

B can be written as matrix.foreach(row => row.foreach(print _))

Supposedly correct answer is B, because A will create function print 100 times more.

I have checked Language Specification but still fail to understand the answer. Can somebody explain this to me?

Solution

In short:

Example A is faster in theory, in practice you shouldn't be able to measure any difference though.

Long answer:

As you already found out

for {xs <- xxs; x <- xs} f(x)

is translated to

xxs.foreach(xs => xs.foreach(x => f(x)))

This is explained in §6.19 SLS:

A for loop
for ( p <- e; p' <- e' ... ) e''
where ... is a (possibly empty) sequence of generators, definitions, or guards, is translated to
e .foreach { case p => for ( p' <- e' ... ) e'' }

Now when one writes a function literal, one gets a new instance every time the function needs to be called (§6.23 SLS). This means that

xs.foreach(x => f(x))

is equivalent to

xs.foreach(new scala.Function1 { def apply(x: T) = f(x)})

When you introduce a local function type

val g = f _; xxs.foreach(xs => xs.foreach(x => g(x)))

you are not introducing an optimization because you still pass a function literal to foreach. In fact the code is slower because the inner foreach is translated to

xs.foreach(new scala.Function1 { def apply(x: T) = g.apply(x) })

where an additional call to the apply method of g happens. Though, you can optimize when you write

val g = f _; xxs.foreach(xs => xs.foreach(g))

because the inner foreach now is translated to

xs.foreach(g())

which means that the function g itself is passed to foreach.

This would mean that B is faster in theory, because no anonymous function needs to be created each time the body of the for comprehension is executed. However, the optimization mentioned above (that the function is directly passed to foreach) is not applied on for comprehensions, because as the spec says the translation includes the creation of function literals, therefore there are always unnecessary function objects created (here I must say that the compiler could optimize that as well, but it doesn't because optimization of for comprehensions is difficult and does still not happen in 2.11). All in all it means that A is more efficient but B would be more efficient if it is written without a for comprehension (and no function literal is created for the innermost function).

Nevertheless, all of these rules can only be applied in theory, because in practice there is the backend of scalac and the JVM itself which both can do optimizations - not to mention optimizations that are done by the CPU. Furthermore your example contains a syscall that is executed on every iteration - it is probably the most expensive operation here that outweighs everything else.