I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.
Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:
import generator._
def yield_values(file:String) = {
generate {
for (x <- Source.fromFile(file).getLines()) {
# Scala is already using the 'yield' keyword.
give("something")
for (field <- ":".r.split(x)) {
if (field contains "/") {
for (subfield <- "/".r.split(field)) { give(subfield) }
} else {
// Scala has no 'continue'. IMO that should be considered
// a bug in Scala.
// Preferred: if (field.startsWith("#")) continue
// Actual: Need to indent all following code
if (!field.startsWith("#")) {
val some_calculation = { ... do some more stuff here ... }
if (some_calculation && field.startsWith("r")) {
give("r")
give(field.slice(1))
} else {
// Typically there will be a good deal more code here to handle different cases
give(field)
}
}
}
}
}
}
}
I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.
I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.
Please, I would prefer not to get any of the following answers:
The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?
Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.
Here's how you do it.
// You want to write
for (x <- xs) { /* complex yield in here */ }
// Instead you write
xs.iterator.flatMap { /* Produce iterators in here */ }
// You want to write
yield(a)
yield(b)
// Instead you write
Iterator(a,b)
// You want to write
yield(a)
/* complex set of yields in here */
// Instead you write
Iterator(a) ++ /* produce complex iterator here */
That's it! All your cases can be reduced to one of these three.
In your case, your example would look something like
Source.fromFile(file).getLines().flatMap(x =>
Iterator("something") ++
":".r.split(x).iterator.flatMap(field =>
if (field contains "/") "/".r.split(field).iterator
else {
if (!field.startsWith("#")) {
/* vals, whatever */
if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
else Iterator(field)
}
else Iterator.empty
}
)
)
P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):
import scala.util.control.Breaks._
for (blah) { breakable { ... break ... } }
but that won't get you what you want because Scala doesn't have the yield you want.