scala performance scala-collections jmh elementwise-operations

Why is zipped faster than zip in Scala?

I have written some Scala code to perform an element-wise operation on a collection. Here I defined two methods that perform the same task. One method uses zip and the other uses zipped.

def ES (arr :Array[Double], arr1 :Array[Double]) :Array[Double] = arr.zip(arr1).map(x => x._1 + x._2)

def ES1(arr :Array[Double], arr1 :Array[Double]) :Array[Double] = (arr,arr1).zipped.map((x,y) => x + y)

To compare these two methods in terms of speed, I wrote the following code:

def fun (arr : Array[Double] , arr1 : Array[Double] , f :(Array[Double],Array[Double]) => Array[Double] , itr : Int) ={
  val t0 = System.nanoTime()
  for (i <- 1 to itr) {
       f(arr,arr1)
       }
  val t1 = System.nanoTime()
  println("Total Time Consumed:" + ((t1 - t0).toDouble / 1000000000).toDouble + "Seconds")
}

I call the fun method and pass ES and ES1 as below:

fun(Array.fill(10000)(math.random), Array.fill(10000)(math.random), ES , 100000)
fun(Array.fill(10000)(math.random), Array.fill(10000)(math.random), ES1, 100000)

The results show that the method named ES1 that uses zipped is faster than method ES that uses zip. Based on these observations, I have two questions.

Why is zipped faster than zip?

Is there any even faster way to do element-wise operations on a collection in Scala?

Solution

To answer your second question:

Is there any more faster way to do element wise operation on a collection in Scala?

The sad truth is that despite it's conciseness, improved productivity, and resilience to bugs, is that functional languages aren't necessarily the most performant. In the OP's example, using higher order functions to define a projection to be executed against collections has overhead, and the tight loop amplifies this. As others have pointed out, additional storage allocation for intermediate and final results will also have overhead.

If performance is critical, although by no means universal, in cases like yours you can unwind concise functional code back into imperative equivalents in order to regain more direct control over memory usage and eliminating function call overheads.

In your specific example, the zipped sums can be performed imperatively by pre-allocating a fixed, mutable array of correct size (since zip stops when one of the collections runs out of elements), and then adding elements at the appropriate index together (since accessing array elements by ordinal index is a very fast operation).

By example, adding a third function, ES3 to your test suite:

def ES3(arr :Array[Double], arr1 :Array[Double]) :Array[Double] = {
   val minSize = math.min(arr.length, arr1.length)
   val array = Array.ofDim[Double](minSize)
   for (i <- 0 to minSize - 1) {
     array(i) = arr(i) + arr1(i)
   }
  array
}

On my i7 I get the following response times:

OP ES Total Time Consumed:23.3747857Seconds
OP ES1 Total Time Consumed:11.7506995Seconds
--
ES3 Total Time Consumed:1.0255231Seconds

Even more heineous would be to do direct in-place mutation of the shorter of the two arrays, which would obviously corrupt the contents of this shorter array, so should only be implemented if the original arrays wouldn't be needed for further work by the caller:

def ES4(arr :Array[Double], arr1 :Array[Double]) :Array[Double] = {
   val minSize = math.min(arr.length, arr1.length)
   val array = if (arr.length < arr1.length) arr else arr1
   for (i <- 0 to minSize - 1) {
      array(i) = arr(i) + arr1(i)
   }
  array
}

Total Time Consumed:0.3542098Seconds

But obviously, direct mutation of arrays elements passed as parameters isn't in the spirit of Scala - this code smells of side effects.

To be honest, if you require this level of performance optimization in tight loops, you're likely better off writing these kinds of algorithms in Rust, Go or C.