Search code examples
antlrgrammarantlrworks

using antlrworks to solve left-recursive


Hi I want to write a grammer( Using ANTLRWORKS ) that accept later ( in debugging mode ) this code

repeat_until
    :'repeat' seq_statement 'until' exp
    ;

read    :

          'read' ID ';'  
    ;

    fragment    
Operation_stat
    :   (NUMBER|ID) OP (NUMBER|ID) 
    ;

OP  :   ('+'|'-'|'*'|'/')
    ;

NUMBER  :
'0'..'9'+   
    ;

LOG_OP  :
('<' | '>' | '=' | '<=' | '>=' )
    ;


ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;



FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;




fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

Thanx for your help


Solution

  • I believe ANLRWorks has a feature to help remove left-recursion from a grammar, although, in my memory, it only works with very basic grammars. It's been a while since I last worded with it, so you have to investigate yourself on that front.

    To manually remove left-recursion, see: http://www.antlr.org/wiki/display/ANTLR3/Left-Recursion+Removal (make sure to go through all 3 sections)

    EDIT

    I'm not sure if I can help you: you seem to be totally missing the point that ANTLR can't cope with left-recursive grammars. Your following parser rules:

    seq_statement 
      :  seq_statement ';' statement 
      |  seq_statement
      ;
    
    simple_exp
      :  simple_exp OP term 
      |  term    
      ;
    
    term    
      :  term OP factor factor 
      |  factor  
      ;
    

    are all so obviously left recursive, that I am not sure how to explain this any clearer. I mean, can't you see what's wrong with a rule like:

    a
      : a b
      ;
    

    ? Which is basically the same as your seq_statement rule.

    I get the impression you're trying to convert some existing grammar into an ANTLR grammar. Is this the case? And do you really know what left-recursion really means?

    EDIT II

    Something like:

    parse
      :  block EOF 
      ;
    
    block
      :  statement (';' statement)* ';'? 
      ;
    
    statement
      :  'read' expression  
      |  'write' expression 
      |  ifStatement
      |  repeatStatement
      |  assignment
      ;
    
    ifStatement
      :  'if' expression 'then' block? ('else' block?)? 'end' 
      ;
    
    repeatStatement
      :  'repeat' block? 'until' expression 
      ;
    
    assignment
      :  Identifier ':=' expression 
      ;
    
    expression
      :  equalityExp
      ;
    
    equalityExp
      :  relationalExp (('=' | '!=') relationalExp)*
      ;
    
    relationalExp
      :  additiveExp (('>=' | '<=' | '>' | '<') additiveExp)*
      ;
    
    additiveExp
      :  multiplicativeExp (('+' | '-') multiplicativeExp)*
      ;
    
    multiplicativeExp
      :  atom (('*' | '/' | '%') atom)*
      ;
    
    atom
      :  Identifier
      |  Int
      |  '(' expression ')' 
      ;
    
    Int
      :  '0'..'9'+
      ;
    
    Identifier
      :  'a'..'z'+
      ;
    
    Space
      :  (' ' | '\t' | '\r' | '\n') {skip();}
      ;
    

    ought to do the trick.