Search code examples
parsingsyntaxbreakformal-languagesrecursive-descent

Validating a "break" statement with a recursive descent parser


In Crafting Interpreters, we implement a little programming language using a recursive descent parser. Among many other things, it has these statements:

statement → exprStmt
          | ifStmt
          | printStmt
          | whileStmt
          | block ;

block     → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" statement ;
ifStmt    → "if" "(" expression ")" statement ( "else" statement )? ;

One of the exercises is to add a break statement to the language. Also, it should be a syntax error to have this statement outside a loop. Naturally, it can appear inside other blocks, if statements etc. if those are inside a loop.

My first approach was to create a new rule, whileBody, to accept break:

## FIRST TRY
statement → exprStmt
          | ifStmt
          | printStmt
          | whileStmt
          | block ;

block     → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
          | break ;
break     →  "break" ";" ;
ifStmt    → "if" "(" expression ")" statement ( "else" statement )? ;  

But we have to accept break inside nested loops, if conditionals etc. What I could imagine is, I'd need a new rule for blocks and conditionals which accept break:

## SECOND TRY
statement → exprStmt
          | ifStmt
          | printStmt
          | whileStmt
          | block ;

block     → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
          | break
          | whileBlock
          | whileIfStmt
whileBlock→  "{" (declaration | break)* "}" ;
whileIfStmt    → "if" "(" expression ")" whileBody ( "else" whileBody )? ;  
break     →  "break" ";"
ifStmt    → "if" "(" expression ")" statement ( "else" statement )? ;  

It is not infeasible for now, but it can be cumbersome to handle it once the language grows. It is boring and error-prone to write even today!

I looked for inspiration in C and Java BNF specifications. Apparently, none of those specifications prohibit the break outside loop. I guess their parsers have ad hoc code to prevent that. So, I followed suit and added code into the parser to prevent break outside loops.

TL;DR

My questions are:

  1. Would the approach of my second try even work? In other words, could a recursive descent parser handle a break statement that only appears inside loops?
  2. Is there a more practical way to bake the break command inside a syntax specification?
  3. Or the standard way is indeed to change a parser to prevent breaks outside loops while parsing?

Solution

    1. Would the approach of my second try even work? In other words, could a recursive descent parser handle a break statement that only appears inside loops?

    Sure. But you need a lot of duplication. Since while is not the only loop construct, I've used a different way of describing the alternatives, which consists of adding _B to the name of non-terminals which might include break statements.

    declaration    → varDecl
                   | statement
    declaration_B  → varDecl
                   | statement_B
    statement      → exprStmt
                   | ifStmt
                   | printStmt
                   | whileStmt
                   | block
    statement_B    → exprStmt
                   | printStmt
                   | whileStmt
                   | breakStmt
                   | ifStmt_B
                   | block_B
    breakStmt      → "break" ";"
    ifStmt         → "if" "(" expression ")" statement ( "else" statement )?
    ifStmt_B       → "if" "(" expression ")" statement_B ( "else" statement_B )?
    whileStmt      → "while" "(" expression ")" statement_B ;
    block          → "{" declaration* "}"
    block_B        → "{" declaration_B* "}"
    

    Not all statement types need to be duplicated. Non-compound statement like exprStmt don't, because they cannot possibly include a break statement (or any other statement type). And the statement which is the target of a loop statement like whileStmt can always include break, regardless of whether the while was inside a loop or not.

    1. Is there a more practical way to bake the break command inside a syntax specification?

    Not unless your syntax specification has marker macros, like the specification used to describe ECMAScript.

    1. Is there a different way to do this?

    Since this is a top-down (recursive descent) parser, it's pretty straight-forward to handle this condition in the parser's execution. You just need to add an argument to every (or many) parsing functions which specifies whether a break is possible or not. Any parsing function called by whileStmt would set that argument to True (or an enumeration indicating that break is possible), while other statement types would just pass the parameter through, and the top-level parsing function would set the argument to False. The breakStmt implementation would just return failure if it is called with False.