Search code examples
parsingantlrantlr4grammar

ANTLR4 - Token recognition error and mismatched input


I am fairly new to the ANTLR grammar. Here is what I have in my g4 file:

tptp_file               : tptp_input* EOF;
tptp_input              : annotated_formula | include;

annotated_formula : fof_annotated  |  cnf_annotated;
fof_annotated : 'fof('name','formula_role','fof_formula annotations').';

name : atomic_word  |  integer;

atomic_word : lower_word  |  single_quoted;

lower_word : lower_alpha alpha_numeric'*';

lower_alpha : '[a-z]';
upper_alpha : '[A-Z]';
numeric : '[0-9]';

alpha_numeric : '('lower_alpha | upper_alpha | numeric | '[_])';

...

I tried using the ANTLR parser on this test file that contains:

fof(an,axiom,p).

But I get an error saying:

line 1:4 token recognition error at: 'a'
line 1:5 token recognition error at: 'n'
line 1:7 token recognition error at: 'a'
line 1:8 token recognition error at: 'x'
line 1:9 token recognition error at: 'io'
line 1:11 token recognition error at: 'm'
line 1:13 token recognition error at: 'p'
line 1:6 mismatched input ',' expecting {'(', ''', '[1-9]', '[a-z]'}
line 1:12 mismatched input ',' expecting '[a-z]'
line 1:14 mismatched input ').' expecting {'(', '[', '[]', '!', '~', '?', '#', '["]', ''', '[1-9]', '[a-z]', '[A-Z]', '[$]'}

Can anybody please help me understand what I am doing wrong and how to fix it? Thanks.

I declaring lower_alpha as a fragment.


Solution

  • At a glance, these things are either wrong, or bad practive:

    Too many literal tokens in your grammar

    Doing ').' in your parser rule, would only match ). but not ) . (with a space between them). Try to minimize literal tokens in parser rules (unless you know what you're doing) and define them inside lexer rules:

    OPAR : '(';
    CPAR : ')';
    DOT : '.';
    

    and use those tokens in your parser rules.

    No lexer rules

    You have no lexer rules but try to create them in your parser rules. The rule:

    lower_alpha : '[a-z]';
    

    matches the text [a-z], not a lowercase letter. Do this instead:

    // Lexer rules start with a capital and no quotes around it
    Lower_alpha : [a-z];
    

    Wrong quantifier

    If you want to match zero or more things, you should not do Alpha_numeric'*' but Alpha_numeric* (without quotes).

    Fixes

    Here is a grammar that works for the given input fof(an,axiom,p).:

    grammar T;
    
    tptp_file
     : tptp_input* EOF
     ;
    
    tptp_input
     : annotated_formula
     | include
     ;
    
    annotated_formula
     : fof_annotated
     | cnf_annotated
     ;
    
    fof_annotated
     : FOF OPAR name COMMA formula_role COMMA fof_formula annotations? CPAR DOT
     ;
    
    name
     : WORD
     | INTEGER
     ;
    
    // TODO implement these parser rules yourself
    formula_role : name;
    include : name;
    cnf_annotated : name;
    fof_formula : name;
    annotations : name;
    
    OPAR : '(';
    CPAR : ')';
    DOT : '.';
    COMMA : ',';
    FOF : 'fof';
    WORD : [a-zA-Z]+;
    INTEGER : [0-9]+;
    SPACES : [ \t\r\n\f]+ -> skip;
    

    parse tree of the example input