Search code examples
scala

Is it possible to 'stream' the maximum of the output of a for-comprehension in Scala?


I'm working on a problem in Scala where I have two input Seq[Int], A and B, and an value function f and the goal is to find the max value of f for any pair of elements (a, b) in A and B.

We can do this naively with a for-comprehension:

val A = Seq(1, 2, 3, 4, 5)
val B = Seq(6, 7, 8, 9, 10)
def f(a: Int, b: Int): Int = a * b // f can be more complex than just a multiplication. This is just an example. 

val results = for {
  a <- A
  b <- B
} yield f(a, b)

results.max

The problem that I'm running into is that A and B can be very large and computing every combination of (a, b) is causing memory errors. I don't actually care about the entire list of (a, b) or even the entire list of results either. I only really care about the maximum value.

Is is possible to stream the list of values produced by the for-comprehension into f and then into the max function so that I don't have to hold the entire list in memory?

Notes

  • The problem that I'm working on is Leetcode #11 - Container with Most Water. I know about the linear time solution, but what I'm really curious about here is whether we can stream a logical sequence of values into an aggregator function in Scala.
  • In my research, I found scala Streams, which look like the previous implementation of the current LazyList class, but these don't look like they will give me the behavior that I want. I don't think that 'stream' is the right word to describe the behavior I want in scala, but I'm not sure what to search for next.

Solution

  • I had a look at the comments and took the experimental approach. I booted up a Scala shell and ran the two following snippets of code.

    Using LazyList:

    val a = LazyList.continually(0).take(Int.MaxValue)
    val b = LazyList.continually(0).take(Int.MaxValue)
    val results = for (a <- a; b <- b) yield (a, b)
    results.max
    

    LazyList doesn't just lazily evaluate the collection, but also tries to hold on onto it, meaning that the collection will "leak" even though that's not the behavior that you were looking for. What happened in my test run was that the process never ran out of memory (although I believe it eventually would have) but there was always a little bit of memory to be reclaimed, so that there's so much garbage collection happening that the process practically stops making meaningful process. See the following VisualVM memory monitoring chart:

    using a lazy list

    Using Iterator:

    val a = Iterator.continually(0).take(Int.MaxValue)
    val b = Iterator.continually(0).take(Int.MaxValue)
    val results = for (a <- a; b <- b) yield (a, b)
    results.max
    

    Iterator is consumed (i.e. there's a side-effect) lazily, but doesn't try to memoize the collection, meaning that evaluated items can be collected. This ran in a few minutes on my laptop. See in the VisualVM chart how memory is progressively consumed and then reclaimed once the "garbage" start to pile up:

    using an iterator