Search code examples
compiler-constructionbisonflex-lexer

How to use indentation as block delimiters with bison and flex


I wounder how to implement indentation as block delimiters in bison + flex. Just like in python. I'm writing my own programming language ( mostly for fun, but I intend to use it together with a game engine ), I'll try to come up with something special that minimizes boilerplate and maximizes dev speed.

I have already written an compiler ( actually a `langToy' to Nasm translator ) in C, but failed. By some reason it was only able to handle one string in the whole source file ( well, I had been awake for more than 48 hours - so... You know, brain meltdown ).

I don't know if curly brackets and/or begin -> end are easier to implement ( I don't have problem doing that ) or if it's just my brain that locks up.

Thanks in advance!


Update: Okay, I have no clue about how to do it with flex. I have problems with returning multiple DEDENTs to the parser. Flex/Bison are relatively new to me.


Update 2: This is the flex-file I've come up with so far; it does not quite get it:

%x t
%option noyywrap

%{
  int lineno = 0, ntab = 0, ltab = 0, dedent = 0;
%}

%%

<*>\n  { ntab = 0; BEGIN(t); }
<t>\t  { ++ntab; }
<t>.   { int i; /* my compiler complains not c99 if i use for( int i=0... */
         if( ntab > ltab )
           printf("> indent >\n");
         else if( ntab < ltab )
           for( i = 0; i < ltab - ntab; i++ )
             printf("< dedent <\n");
         else
           printf("=        =\n");

         ltab = ntab; ntab = 0;
         BEGIN(INITIAL);
         /* move to next rule */
         REJECT;}
.    /* ignore everything else for now */

%%

main()
{
  yyin = fopen( "test", "r" );
  yylex();
}

You can try to play around with it, maybe you sees what I'm missing. returning multiple dedents would be an ease in Haxe ( return t_dedent( num ); ).

This code doesn't always match the indents/dedents correctly.


Update 3: I think that I will give up hope on flex and do it my own way, If anyone knows how to do it in flex I would be happy to hear it anyways.


Solution

  • What you need to do is have flex count the amount of whitespace at the beginning of every line and insert an appropriate number of INDENT/UNINDENT tokens for the parser to use to group things. One question is what you want to do about tabs vs spaces -- do you just want to have them be equivalent with fixed tab stops, or do you want to require indenting to be consistent (so if one line begins with a tab and the next with a space, you signal an error, which is probably a little harder).

    Assuming you want fixed 8-column tabstops, you can use something like

    %{
    /* globals to track current indentation */
    int current_line_indent = 0;   /* indentation of the current line */
    int indent_level = 0;          /* indentation level passed to the parser */
    %}
    
    %x indent /* start state for parsing the indentation */
    %s normal /* normal start state for everything else */
    
    %%
    <indent>" "      { current_line_indent++; }
    <indent>"\t"     { current_line_indent = (current_line_indent + 8) & ~7; }
    <indent>"\n"     { current_line_indent = 0; /*ignoring blank line */ }
    <indent>.        {
                       unput(*yytext);
                       if (current_line_indent > indent_level) {
                           indent_level++;
                           return INDENT;
                       } else if (current_line_indent < indent_level) {
                           indent_level--;
                           return UNINDENT;
                       } else {
                           BEGIN normal;
                       }
                     }
    
    <normal>"\n"     { current_line_indent = 0; BEGIN indent; }
    ... other flex rules ...
    

    You do have to make sure you start the parse in indent mode (to get the indentation on the first line).