Search code examples
javaantlrtokenchars

ANTLR: Lexer rule accepting strictly one letter, and a token of multiple chars, instead of just one (Java)


I've written the below grammar for ANTLR parser and lexer for building trees for logical formulae and had a couple of questions if someone could help:

class AntlrFormulaParser extends Parser;

options {
    buildAST = true;
}

biconexpr : impexpr (BICONDITIONAL^ impexpr)*;

impexpr : orexpr (IMPLICATION^ orexpr)*;

orexpr : andexpr (DISJUNCTION^ andexpr)*;

andexpr : notexpr (CONJUNCTION^ notexpr)*;

notexpr : (NEGATION^)? formula;

formula 
    : atom
    | LEFT_PAREN! biconexpr RIGHT_PAREN!
    ;

atom
    : CHAR
    | TRUTH
    | FALSITY
    ;


class AntlrFormulaLexer extends Lexer;

// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');

// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | "->");
BICONDITIONAL: ('\u2194' | "<->");

WHITESPACE : (' ' | '\t' | '\r' | '\n') { $setType(Token.SKIP); };

The tree grammar:

tree grammar AntlrFormulaTreeParser;

options {
    tokenVocab=AntlrFormula;
    ASTLabelType=CommonTree;
}

expr returns [Formula f]
    : ^(BICONDITIONAL f1=expr f2=expr) {
        $f = new Biconditional(f1, f2);
    }
    | ^(IMPLICATION f1=expr f2=expr) {
        $f = new Implication(f1, f2);
    }
    | ^(DISJUNCTION f1=expr f2=expr) {
        $f = new Disjunction(f1, f2);
    }
    | ^(CONJUNCTION f1=expr f2=expr) {
        $f = new Conjunction(f1, f2);
    }
    | ^(NEGATION f1=expr) {
        $f = new Negation(f1);
    }
    | CHAR {
        $f = new Atom($CHAR.getText());
    }
    | TRUTH {
        $f = Atom.TRUTH;
    }
    | FALSITY {
        $f = Atom.FALSITY;
    }
    ;

The problems I'm having with the above grammar are these:

  1. The tokens, IMPLICATION and BICONDITIONAL, in the java code for AntlrFormulaLexer only seem to be checking for their respective first character (i.e. '-' and '<') to match the token, instead of the whole string, as specified in the grammar.

  2. When testing the java code for AntlrFormulaParser, if I pass a string such as "~ab", it returns a tree of "(~ a)" (and a string "ab&c" returns just "a"), when it should really be returning an error/exception, since an atom can only have one letter according to the above grammar. It doesn't give any error/exception at all with these sample strings.

I'd really appreciate if someone could help me solve these couple of problems. Thank you :)


Solution

  • I would change the following definitions as:

    IMPLICATION: ('\u2192' | '->');
    BICONDITIONAL: ('\u2194' | '<->');
    

    note "->" vs '->'

    And to solve the error issue:

    formula 
        : (
             atom
           | LEFT_PAREN! biconexpr RIGHT_PAREN! 
          ) EOF
        ;
    

    from here: http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554943

    Fixed grammar to compile against antlr 3.3 (save as AntlrFormula.g):

    grammar AntlrFormula;
    
    options {
        output = AST; 
    }
    
    
    program : formula ;
    
    formula : atom | LEFT_PAREN! biconexpr RIGHT_PAREN! ;
    
    biconexpr : impexpr (BICONDITIONAL^ impexpr)*;
    
    impexpr : orexpr (IMPLICATION^ orexpr)*;
    
    orexpr : andexpr (DISJUNCTION^ andexpr)*;
    
    andexpr : notexpr (CONJUNCTION^ notexpr)*;
    
    notexpr : (NEGATION^)? formula;
    
    
    atom
        : CHAR
        | TRUTH
        | FALSITY
        ;
    
    
    // Atoms
    CHAR: 'a'..'z';
    TRUTH: ('\u22A4' | 'T');
    FALSITY: ('\u22A5' | 'F');
    
    // Grouping
    LEFT_PAREN: '(';
    RIGHT_PAREN: ')';
    NEGATION: ('\u00AC' | '~' | '!');
    CONJUNCTION: ('\u2227' | '&' | '^');
    DISJUNCTION: ('\u2228' | '|' | 'V');
    IMPLICATION: ('\u2192' | '->');
    BICONDITIONAL: ('\u2194' | '<->');
    
    WHITESPACE : (' ' | '\t' | '\r' | '\n') { $channel = HIDDEN; };
    

    Link to antlr 3.3 binary: http://www.antlr.org/download/antlr-3.3-complete.jar

    you will need to try to match the program rule in order to match the complete file.

    testable with this class:

    import org.antlr.runtime.*;
    
    public class Main {
        public static void main(String[] args) {
            AntlrFormulaLexer lexer = new AntlrFormulaLexer(new ANTLRStringStream("(~ab)"));
            AntlrFormulaParser p = new AntlrFormulaParser(new CommonTokenStream(lexer));
    
            try {
                p.program();
                if ( p.failed() || p.getNumberOfSyntaxErrors() != 0) {
                    System.out.println("failed");
                }
            } catch (RecognitionException e) {
                e.printStackTrace();
            }
        }
    }