Search code examples
javatokengrammarantlr4rule

Antlr4 doesn't recognize identifiers


I'm trying to create a grammar which parses a file line by line.

grammar Comp;

options 
{
    language = Java;
}

@header {
    package analyseur;
    import java.util.*;
    import component.*;
}

@parser::members {
    /** Line to write in the new java file */
    public String line;
}

start   
        : objectRule        {System.out.println("OBJ");  line = $objectRule.text;}
        | anyString         {System.out.println("ANY");  line = $anyString.text;}
        ;

objectRule : ObjectKeyword ID ;

anyString : ANY_STRING ;


ObjectKeyword :  'Object' ;
ID  :   [a-zA-Z]+ ;
ANY_STRING :  (~'\n')+ ;
WhiteSpace : (' '|'\t') -> skip;

When I send the lexem 'Object o' to the grammar, the output is ANY instead of OBJ.

'Object o'   =>  'ANY'   // I would like OBJ

I know the ANY_STRING is longer but I wrote lexer tokens in the order. What is the problem ?

Thank you very much for your help ! ;)


Solution

  • For lexer rules, the rule with the longest match wins, independent of rule ordering. If the match length is the same, then the first listed rule wins.

    To make rule order meaningful, reduce the possible match length of the ANY_STRING rule to be the same or less than any key word or id:

    ANY_STRING: ~( ' ' | '\n' | '\t' ) ; // also?: '\r' | '\f' | '_' 
    

    Update

    To see what the lexer is actually doing, dump the token stream.