Search code examples
pythonscalageneratoryieldtext-processing

What is the preferred way to implement 'yield' in Scala?


I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.

Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:

import generator._

def yield_values(file:String) = {
  generate {
    for (x <- Source.fromFile(file).getLines()) {
      # Scala is already using the 'yield' keyword.
      give("something")
      for (field <- ":".r.split(x)) {
        if (field contains "/") {
          for (subfield <- "/".r.split(field)) { give(subfield) }
        } else {
          // Scala has no 'continue'.  IMO that should be considered
          // a bug in Scala.
          // Preferred: if (field.startsWith("#")) continue
          // Actual: Need to indent all following code
          if (!field.startsWith("#")) {
            val some_calculation = { ... do some more stuff here ... }
            if (some_calculation && field.startsWith("r")) {
              give("r")
              give(field.slice(1))
            } else {
              // Typically there will be a good deal more code here to handle different cases
              give(field)
            }
          }
        }
      }
    }
  }
}

I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.

I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.

Please, I would prefer not to get any of the following answers:

  1. 'yield' sucks, continuations are better. (Yes, in general you can do more with continuations. But they are hella hard to understand, and 99% of the time an iterator is all you want or need. If Scala provides lots of powerful tools but they're too hard to use in practice, the language won't succeed.)
  2. This is a duplicate. (Please see my comments above.)
  3. You should rewrite your code using streams, continuations, recursion, etc. etc. (Please see #1. I will also add, technically you don't need for loops either. For that matter, technically you can do absolutely everything you ever need using SKI combinators.)
  4. Your function is too long. Break it up into smaller pieces and you won't need 'yield'. You'd have to do this in production code, anyway. (First, "you won't need 'yield'" is doubtful in any case. Second, this isn't production code. Third, for text processing like this, very often, breaking the function into smaller pieces -- especially when the language forces you to do this because it lacks the useful constructs -- only makes the code harder to understand.)
  5. Rewrite your code with a function passed in. (Technically, yes you can do this. But the result is no longer an iterator, and chaining iterators is much nicer than chaining functions. In general, a language should not force me to write in an unnatural style -- certainly, the Scala creators believe this in general, since they provide shitloads of syntactic sugar.)
  6. Rewrite your code in this, that, or the other way, or some other cool, awesome way I just thought of.

Solution

  • The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?

    Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.

    Here's how you do it.

    // You want to write
    for (x <- xs) { /* complex yield in here */ }
    // Instead you write
    xs.iterator.flatMap { /* Produce iterators in here */ }
    
    // You want to write
    yield(a)
    yield(b)
    // Instead you write
    Iterator(a,b)
    
    // You want to write
    yield(a)
    /* complex set of yields in here */
    // Instead you write
    Iterator(a) ++ /* produce complex iterator here */
    

    That's it! All your cases can be reduced to one of these three.

    In your case, your example would look something like

    Source.fromFile(file).getLines().flatMap(x =>
      Iterator("something") ++
      ":".r.split(x).iterator.flatMap(field =>
        if (field contains "/") "/".r.split(field).iterator
        else {
          if (!field.startsWith("#")) {
            /* vals, whatever */
            if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
            else Iterator(field)
          }
          else Iterator.empty
        }
      )
    )
    

    P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):

    import scala.util.control.Breaks._
    for (blah) { breakable { ... break ... } }
    

    but that won't get you what you want because Scala doesn't have the yield you want.