Search code examples
parsingcompiler-constructionabstract-syntax-treeinterpreter

How are the continue/break statements parsed/represented in ASTs?


I'm attempting to implement for loops into my interpreter, and now am attempting to implement the one token statements continue and break into the parser.

Considering my lexer,

Lexer

"break" -> TOKEN::TBREAK
"continue" -> TOKEN::TCONTINUE

I'm considering two ways of implementing them via parsing. One is to have a two different nodes that will be inserted into the AST called BreakNode and ContinueNode:

Parser 1

BreakGram    -> TOKEN::TBREAK ';' = {new BreakNode();}
ContinueGram -> TOKEN::TCONTINUE ';' = {new ContinueNode();}

The other way is to have one shared node for one token statement nodes, inserting the token type as a specifier for what type of statement it would be:

Parser 2

OneTokenStatement ->  TOKEN::TBREAK ';' = {new OneTokenStatement(TOKEN::TBREAK);}
                    | TOKEN::TCONTINUE ';' = {new OneTokenStatement(TOKEN::TCONTINUE);}

How do other grammars for other languages such as Java/Python handle these two tokens? Are they represented with the different nodes in the AST (like example 1) or with the same node (example 2).


Solution

  • It's a judgement call, and I'm sure that both solutions are found in different projects.

    The semantics for break and continue are certainly similar enough that they could be handled by a single function; they both translate into goto statements, although obviously to different places. (Also, probably, they exit different scopes.)

    Note that they are not single token statements in all languages -- some languages let you specify which loop rather than always using the innermost one -- and there is also an important difference in some C family languages, in that break applies to switch blocks but continue doesn't.

    Personally, I'd probably go with the "different node types" solution, but I don't think it will make much difference.