Search code examples
antlrcompiler-theory

Removing Left Recursion in ANTLR


As is explained in Removing left recursion , there are two ways to remove the left recursion.

  • Modify the original grammar to remove the left recursion using some procedure
  • Write the grammar originally not to have the left recursion

What people normally use for removing (not having) the left recursion with ANTLR? I've used flex/bison for parser, but I need to use ANTLR. The only thing I'm concerned about using ANTLR (or LL parser in genearal) is left recursion removal.

  • In practical sense, how serious of removing left recursion in ANTLR? Is this a showstopper in using ANTLR? Or, nobody cares about it in ANTLR community?
  • I like the idea of AST generation of ANTLR. In terms of getting AST quick and easy way, which method (out of the 2 removing left recursion methods) is preferable?

Added

I did some experiment with the following grammar.

E -> E + T|T
T -> T * F|F
F -> INT | ( E )

After left recursion removal, I get the following one

E -> TE'
E' -> null | + TE'
T -> FT'
T' -> null | * FT'

I could come up with the following ANTLR representation. Even though, It's relatively pretty simple and straightforward, it seems the grammar that doesn't have the left recursion should be the better way to go.

grammar T;

options {
    language=Python;
}

start returns [value]
   : e {$value = $e.value};
e returns [value]
   : t ep  
     {
       $value = $t.value
       if $ep.value != None:
         $value += $ep.value
     }
   ;
ep returns [value]
   : {$value = None}
   | '+' t r = ep 
     {
       $value = $t.value
       if $r.value != None:
            $value += $r.value
     }
   ;
t returns [value]
  : f tp 
    {
      $value = $f.value
      if $tp.value != None:
        $value *= $tp.value
    }
  ;
tp returns [value]
  : {$value = None}
  | '*' f r = tp 
    {
      $value = $f.value;
      if $r.value != None:
        $value *= $r.value
    }
  ;
f returns [int value]
  : INT {$value = int($INT.text)}
  | '(' e ')' {$value = $e.value}
  ;

INT :   '0'..'9'+ ;
WS: (' '|'\n'|'\r')+ {$channel=HIDDEN;} ;

Solution

  • Consider something like a typical parameter list:

    parameter_list: parameter
                  | parameter_list ',' parameter
                  ;
    

    Since you don't care about anything like precedence or associativity with parameters, this is fairly easy to convert to right recursion, at the expense of adding an extra production:

    parameter_list: parameter more_params
                  ;
    
    more_params:
               | ',' parameter more_params
               ;
    

    For the most serious cases, you might want to spend some time in the Dragon Book. Doing a quick check, this is covered primarily in chapter 4.

    As far as seriousness goes, I'm pretty sure ANTLR simply won't accept a grammar that contains left recursion, which would put it into the "absolute necessity" category.