Search code examples
bisonyacccpython

How to implement python style indentation in yacc or bison?


I have been trying to implement python style indentation in my bison grammar, some insights or ideas to that implementation would be nice.


Solution

  • The best way is usually to have the lexer track the indentation level and insert INDENT/UNINDENT tokens into the token stream appropriately. Here's some flex code I have to do that:

    %x LINESTART
    %s NORMAL
    %{
    static std::stack<int> indent;
    static int indent_depth(const char *);
    %}
    
    %%
    
    <INITIAL>.*|\n          { yyless(0); BEGIN(LINESTART); indent.push(0); }
    <LINESTART>[ \t]*       { int depth = indent_depth(yytext);
                              if (depth < indent.top()) {
                                  indent.pop();
                                  yyless(0);
                                  return UNINDENT; }
                              BEGIN(NORMAL);
                              if (depth > indent.top()) {
                                  indent.push(depth);
                                  return INDENT; } }
    <LINESTART>.            { yyless(0);
                              if (indent.top() > 0) {
                                  indent.pop();
                                  return UNINDENT; }
                              BEGIN(NORMAL); }
    <LINESTART><<EOF>>      { if (indent.top() > 0) {
                                  indent.pop();
                                  return UNINDENT; }
                              BEGIN(NORMAL); }
    <LINESTART>[ \t]*\n     { lineno++; }
    <LINESTART>[ \t]*#.*\n  { lineno++; }
    [[({]                   { parens++; return *yytext; }
    [])}]                   { if (--parens < 0) parens = 0;
                              return *yytext; }
    \n                      { lineno++;
                              if (parens == 0) BEGIN(LINESTART); }
    

    This code is somewhat tricky with special cases -- for example, you need to ignore blank lines and lines with just comments, and you probably also want to ignore indentation within unbalanced parenthesis (which the above does).