Search code examples
c#parsingcompiler-constructioninterpreter

Techniques for parsing code blocks without curly braces


I'm writing a simple parser/interpreter in C# from scratch (no third-party libraries). It compiles to bytecode and then I have class that runs the bytecode. I'm getting close to wrapping it up. I've just implemented while and for loops and am working on if|else if|else blocks.

As it stands, my parser requires all of these structures to use curly braces. I'd like to make it more C-like and have the curly braces be optional when the block contains just a single statement. This is giving me trouble.

if (condition)
{
    // Make curly braces optional when there is just one statement here
}

The problem is tracking state. How does the parser know when a block without curly braces has ended. One approach would be to check if there is a block without braces in effect after each and every statement. However, there are a lot of different scenarios that would constitute a statement and so those checks would need to be in a number of places. That feels a little brittle to me.

I'm just wondering if anyone has done this and knows of any slick tricks for tracking when a code block ends when there are no curly braces.


Solution

  • You need to look into recursive descent parser. It makes creating parsers a lot easier. Lets assume you have grammar looking like this:

    statement
       : 'if' paren_expr ['{'] statement ['}'] 
    
    paren_expr
       : '(' expr ')'
    

    then using recursive descent you can do something like:

    public void Statement()
    {
        if(curToken == Token.If)
        {
           Eat(Token.If); // Eat is convenience method that moves token pointer on
           if(curToken == Token.LParen)
           {
              Eat(Token.LParen)
              ParenExpr();
              Eat(Token.RParen);
           }
           if(curToken == Token.LBrace) // this will signify a block of statements
           {   
              Eat(Token.LBrace);
              while(curToken != Token.RBrace)
                 Statement();
              Eat(Token.RBrace);
           }
           else
              Statement();              
        }
    }
    
    public void ParenExpr()
    {
       // do other token checks
    }
    

    doing this for all of your non terminals, you can easily build up an AST and from that, you can generate your bytecode.