Search code examples
scalalazy-evaluation

How do you improve the performance of foldLeft (or scanLeft) when you don't always need to examine all the elements of a list?


For example if you have millions of elements, but typically only need to examine the first million(e.g. if you are accumulating a sum and you saturate at some max value or you have some other complex data structure you are building, but you finish after examining the first M elements). FoldLeft always forces you to iterate over the entire sequence. Ideally, you could supply a predicate that lets foldLeft know you are done.

If scanLeft is evaluated lazily(?), perhaps scanLeft along with a find (find first valid element) can accomplish this. I believe something like this would work in Haskell, but not sure about Scala.

numbers.scanLeft(0)((a, b) => a + b).find(_ >= 100)

So if numbers = List(100,0,9,10), then scanLeft will only look at the first element.


Solution

  • scanLeft produces a lazy collection for already lazy collections, like Iterator or LazyList (Strem prior 2.13). Thus, you can use that to abort early.
    For example:

    LazyList.continually(100)
      .scanLeft(0) { case (acc, n) => acc + n }
      .takeWhile(_ < 1000)
      .toList
    // res: List[Int] = List(0, 100, 200, 300, 400, 500, 600, 700, 800, 900)
    
    List(0, 100, 5, 300)
      .iterator
      .map(i => { println(i); i })
      .scanLeft(0) { case (acc, n) => acc + n }
      .find(_ >= 100)
    // 0
    // 100
    // res: Option[Int] = Some(100)