kotlin parsing grammar context-free-grammar

Simple arithmetic parser created with Kotlin and better-parse fails

Please have a look at the following parser I implemented in Kotlin using the parser combinator library better-parse. It parses—should parse, rather—simple arithmetic expressions:

object CalcGrammar: Grammar<Int>() {
    val num by regexToken("""\d+""")
    val pls by literalToken("+")
    val min by literalToken("-")
    val mul by literalToken("*")
    val div by literalToken("/")
    val lpr by literalToken("(")
    val rpr by literalToken(")")
    val wsp by regexToken("\\s+", ignore = true)

    // expr ::= term + expr | term - expr | term
    // term ::= fact * term | fact / term | fact
    // fact ::= (expr) | -fact | int

    val fact: Parser<Int> by
        skip(lpr) and parser(::expr) and skip(rpr) or
        (skip(min) and parser(::fact) map { -it }) or
        (num map { it.text.toInt() })
    val term by
        leftAssociative(fact, mul) { a, _, b -> a * b } or
        leftAssociative(fact, div) { a, _, b -> a / b } or
        fact
    val expr by
        leftAssociative(term, pls) { a, _, b -> a + b } or
        leftAssociative(term, min) { a, _, b -> a - b } or
        term

    override val rootParser by expr
}

However, when I parse -2 + 4 - 5 + 6, I get this ParseException:

Could not parse input: UnparsedRemainder(startsWith=min@8 for "-" at 7 (1:8))

At first, I thought the issue was that I have not codified the self-recursion in the expr productions, i.e., the code does not accurately represent the grammar:

// Reference to `expr` is missing on the right-hand side...
val expr = ... leftAssociative(term, min) ...

// ...although the corresponding production defines it.
// expr ::= ... term - expr ...

But then I noticed that the official example for an arithmetic parser provided as part of the library's documentation—which turns out to be almost identical afaict—also omits this.

What did I do wrong if not this? And how can I make it work?

Solution

I am not entirely familiar with this library, but it seems your translation from grammar to code is too literal for the library's syntax and the library actually implicitly handles much of what you are explicitly writing, and it seems that as a part of this "ease of use" it breaks what is seemingly correct code.

For starters, you will find that your code behaves the exact same way whether or not you chain or term on to the end of expr, and or fact on to the end of term.

Based on this I was able to come to the conclusion that these ors are not working as expected when chaining together the leftAssociatives, in fact you would run in to the same problem had you attempted to parse a division. I believe it is for this reason that the example you provided a link to combines addition and subtraction (as with multiplication and division) into a single, more dynamic, leftAssociative call. And if you copy that same work to your own code, it runs flawlessly:

object CalcGrammar: Grammar<Int>() {
    val num by regexToken("""\d+""")
    val pls by literalToken("+")
    val min by literalToken("-")
    val mul by literalToken("*")
    val div by literalToken("/")
    val lpr by literalToken("(")
    val rpr by literalToken(")")
    val wsp by regexToken("\\s+", ignore = true)

    // expr ::= term + expr | term - expr | term
    // term ::= fact * term | fact / term | fact
    // fact ::= (expr) | -fact | int

    val fact: Parser<Int> by
      skip(lpr) and parser(::expr) and skip(rpr) or
          (skip(min) and parser(::fact) map { -it }) or
          (num map { it.text.toInt() })

    private val term by
      leftAssociative(fact, div or mul use { type }) { a, op, b ->
          if (op == div) a / b else a * b
      }

    private val expr by
      leftAssociative(term, pls or min use { type }) { a, op, b ->
          if (op == pls) a + b else a - b
      }

    override val rootParser by expr
}