I'm attempting to implement for loops into my interpreter, and now am attempting to implement the one token statements continue
and break
into the parser.
Considering my lexer,
Lexer
"break" -> TOKEN::TBREAK
"continue" -> TOKEN::TCONTINUE
I'm considering two ways of implementing them via parsing.
One is to have a two different nodes that will be inserted into the AST called BreakNode
and ContinueNode
:
Parser 1
BreakGram -> TOKEN::TBREAK ';' = {new BreakNode();}
ContinueGram -> TOKEN::TCONTINUE ';' = {new ContinueNode();}
The other way is to have one shared node for one token statement nodes, inserting the token type as a specifier for what type of statement it would be:
Parser 2
OneTokenStatement -> TOKEN::TBREAK ';' = {new OneTokenStatement(TOKEN::TBREAK);}
| TOKEN::TCONTINUE ';' = {new OneTokenStatement(TOKEN::TCONTINUE);}
How do other grammars for other languages such as Java/Python handle these two tokens? Are they represented with the different nodes in the AST (like example 1) or with the same node (example 2).
It's a judgement call, and I'm sure that both solutions are found in different projects.
The semantics for break
and continue
are certainly similar enough that they could be handled by a single function; they both translate into goto
statements, although obviously to different places. (Also, probably, they exit different scopes.)
Note that they are not single token statements in all languages -- some languages let you specify which loop rather than always using the innermost one -- and there is also an important difference in some C family languages, in that break
applies to switch
blocks but continue
doesn't.
Personally, I'd probably go with the "different node types" solution, but I don't think it will make much difference.