Search code examples
compiler-constructionparsingantlrtokenizelexer

ANTLR lexer mismatches tokens


I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1.

grammar sample;

assignment  :  IDENT ':=' NUM ';' ;

IDENT       :  ('a'..'z')+ ;

NUM         :  ('0'..'9')+ ;

WS          :  (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ;

Obviously, this statement is accepted by the grammar:

x := 99;

But this one also is:

x := @!$()()%99***;

Output from the ANTLRworks Interpreter:

ANTLR Interpreter diagram
(source: barry at cs.sierracollege.edu)

What am I doing wrong? Even other sample grammars that come with ANTLR (such as the CMinus grammar) exhibit this behavior.


Solution

  • If you look at the console of your ANTLRWorks IDE, you'll see a lot of lexer errors.

    Try it on the command line:

    grammar Sample;
    
    @members {
      public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("x := @!$()()\%99***;");
        SampleLexer lexer = new SampleLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        SampleParser parser = new SampleParser(tokens);
        parser.assignment();
      }
    }
    
    assignment  :  IDENT ':=' NUM ';' ;
    
    IDENT       :  ('a'..'z')+ ;
    
    NUM         :  ('0'..'9')+ ;
    
    WS          :  (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ;
    

    and then:

    // generate parser/lexer
    java -cp antlr-3.2.jar org.antlr.Tool Sample.g
    
    // compile
    javac -cp antlr-3.2.jar *.java
    
    // run Windows
    java -cp .;antlr-3.2.jar SampleParser
    // or run *nix/MacOS
    java -cp .:antlr-3.2.jar SampleParser
    

    will produce:

    line 1:5 no viable alternative at character '@'
    line 1:6 no viable alternative at character '!'
    line 1:7 no viable alternative at character '$'
    line 1:8 no viable alternative at character '('
    line 1:9 no viable alternative at character ')'
    line 1:10 no viable alternative at character '('
    line 1:11 no viable alternative at character ')'
    line 1:12 no viable alternative at character '%'
    line 1:15 no viable alternative at character '*'
    line 1:16 no viable alternative at character '*'
    line 1:17 no viable alternative at character '*'