I have been trying to implement python style indentation in my bison grammar, some insights or ideas to that implementation would be nice.
The best way is usually to have the lexer track the indentation level and insert INDENT/UNINDENT tokens into the token stream appropriately. Here's some flex code I have to do that:
%x LINESTART
%s NORMAL
%{
static std::stack<int> indent;
static int indent_depth(const char *);
%}
%%
<INITIAL>.*|\n { yyless(0); BEGIN(LINESTART); indent.push(0); }
<LINESTART>[ \t]* { int depth = indent_depth(yytext);
if (depth < indent.top()) {
indent.pop();
yyless(0);
return UNINDENT; }
BEGIN(NORMAL);
if (depth > indent.top()) {
indent.push(depth);
return INDENT; } }
<LINESTART>. { yyless(0);
if (indent.top() > 0) {
indent.pop();
return UNINDENT; }
BEGIN(NORMAL); }
<LINESTART><<EOF>> { if (indent.top() > 0) {
indent.pop();
return UNINDENT; }
BEGIN(NORMAL); }
<LINESTART>[ \t]*\n { lineno++; }
<LINESTART>[ \t]*#.*\n { lineno++; }
[[({] { parens++; return *yytext; }
[])}] { if (--parens < 0) parens = 0;
return *yytext; }
\n { lineno++;
if (parens == 0) BEGIN(LINESTART); }
This code is somewhat tricky with special cases -- for example, you need to ignore blank lines and lines with just comments, and you probably also want to ignore indentation within unbalanced parenthesis (which the above does).