Is there a convenient way to use Scala's parser combinators to parse languages where indentation is significant? (e.g. Python)
Let's assume we have a very simple language where this is a valid program
block
inside
the
block
and we want to parse this into a List[String]
with each line inside the block as one String
.
We first define a method that takes a minimum indentation level and returns a parser for a line with that indentation level.
def line(minIndent:Int):Parser[String] =
repN(minIndent + 1,"\\s".r) ~ ".*".r ^^ {case s ~ r => s.mkString + r}
Then we define a block with a minimum indentation level by repeating the line parser with a suitable separator between lines.
def lines(minIndent:Int):Parser[List[String]] =
rep1sep(line(minIndent), "[\n\r]|(\n\r)".r)
Now we can define a parser for our little language like this:
val block:Parser[List[String]] =
(("\\s*".r <~ "block\\n".r) ^^ { _.size }) >> lines
It first determines the current indentation level and then passes that as the minimum to the lines parser. Let's test it:
val s =
"""block
inside
the
block
outside
the
block"""
println(block(new CharSequenceReader(s)))
And we get
[4.10] parsed: List( inside, the, block)
For all of this to compile, you need these imports
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader
And you need to put everything into an object that extends RegexParsers
like so
object MyParsers extends RegexParsers {
override def skipWhitespace = false
....