Search code examples
antlrwhitespaceidentifier

ANTLR - identifier with whitespace


i want identifiers that can contain whitespace.

grammar WhitespaceInSymbols;

premise :   ( options {greedy=false;} : 'IF' )  id=ID{
System.out.println($id.text);
};

ID  :   ('a'..'z'|'A'..'Z')+ (' '('a'..'z'|'A'..'Z')+)* 
;

WS  :   ' '+ {skip();}
;

When i test this with "IF statement analyzed" i get a MissingTokenException and the output "IF statement analyzed".
I thought, that by using greedy=false i could tell ANTLR to exit afer 'IF' and take it as a token. But instead the IF is part of the ID. Is there a way to achieve my goal? I already tried some variations of the greed=false-option, but without success.


Solution

  • I thought, that by using greedy=false i could tell ANTLR to exit afer 'IF' and take it as a token.

    No, the parser has nothing to say about the creation of tokens: the input is first tokenized and then the parser rules are applied on these tokens. So setting greedy=false has no effect.

    You can do this (creating ID tokens with white spaces), but it will be a horrible solution with many predicates, and a few custom methods in the lexer doing manual look-aheads: you really, really don't want this! A much cleaner solution would be to introduce a id rule in your parser and let it match one or more ID tokens.

    A demo:

    grammar WhitespaceInSymbols;
    
    premise
      :  IF id THEN EOF
      ;
    
    id
      :  ID+
      ;
    
    IF
      :  'IF'
      ;
    
    THEN
      :  'THEN'
      ;
    
    ID  
      :  ('a'..'z' | 'A'..'Z')+
      ;
    
    WS  
      :  ' '+ {skip();}
      ;
    

    would parse the input IF statement analyzed THEN into the following tree:

    enter image description here