Search code examples
javaantlrantlr4antlr3

ANTLR: Is there a simple example?


I'd like to get started with ANTLR, but after spending a few hours reviewing the examples at the antlr.org site, I still can't get a clear understanding of the grammar to Java process.

Is there some simple example, something like a four-operations calculator implemented with ANTLR going through the parser definition and all the way to the Java source code?


Solution

  • Note: this answer is for ANTLR3! If you're looking for an ANTLR4 example, then this Q&A demonstrates how to create a simple expression parser, and evaluator using ANTLR4.


    You first create a grammar. Below is a small grammar that you can use to evaluate expressions that are built using the 4 basic math operators: +, -, * and /. You can also group expressions using parenthesis.

    Note that this grammar is just a very basic one: it does not handle unary operators (the minus in: -1+9) or decimals like .99 (without a leading number), to name just two shortcomings. This is just an example you can work on yourself.

    Here's the contents of the grammar file Exp.g:

    grammar Exp;
    
    /* This will be the entry point of our parser. */
    eval
        :    additionExp EOF
        ;
    
    /* Addition and subtraction have the lowest precedence. */
    additionExp
        :    multiplyExp 
             ( '+' multiplyExp 
             | '-' multiplyExp
             )* 
        ;
    
    /* Multiplication and division have a higher precedence. */
    multiplyExp
        :    atomExp
             ( '*' atomExp 
             | '/' atomExp
             )* 
        ;
    
    /* An expression atom is the smallest part of an expression: a number. Or 
       when we encounter parenthesis, we're making a recursive call back to the
       rule 'additionExp'. As you can see, an 'atomExp' has the highest precedence. */
    atomExp
        :    Number
        |    '(' additionExp ')'
        ;
    
    /* A number: can be an integer value, or a decimal value */
    Number
        :    ('0'..'9')+ ('.' ('0'..'9')+)?
        ;
    
    /* We're going to ignore all white space characters */
    WS  
        :   (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;}
        ;
    

    (Parser rules start with a lower case letter, and lexer rules start with a capital letter)

    After creating the grammar, you'll want to generate a parser and lexer from it. Download the ANTLR jar and store it in the same directory as your grammar file.

    Execute the following command on your shell/command prompt:

    java -cp antlr-3.2.jar org.antlr.Tool Exp.g
    

    It should not produce any error message, and the files ExpLexer.java, ExpParser.java and Exp.tokens should now be generated.

    To see if it all works properly, create this test class:

    import org.antlr.runtime.*;
    
    public class ANTLRDemo {
        public static void main(String[] args) throws Exception {
            ANTLRStringStream in = new ANTLRStringStream("12*(5-6)");
            ExpLexer lexer = new ExpLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            ExpParser parser = new ExpParser(tokens);
            parser.eval();
        }
    }
    

    and compile it:

    // *nix/MacOS
    javac -cp .:antlr-3.2.jar ANTLRDemo.java
    
    // Windows
    javac -cp .;antlr-3.2.jar ANTLRDemo.java
    

    and then run it:

    // *nix/MacOS
    java -cp .:antlr-3.2.jar ANTLRDemo
    
    // Windows
    java -cp .;antlr-3.2.jar ANTLRDemo
    

    If all goes well, nothing is being printed to the console. This means the parser did not find any error. When you change "12*(5-6)" into "12*(5-6" and then recompile and run it, there should be printed the following:

    line 0:-1 mismatched input '<EOF>' expecting ')'
    

    Okay, now we want to add a bit of Java code to the grammar so that the parser actually does something useful. Adding code can be done by placing { and } inside your grammar with some plain Java code inside it.

    But first: all parser rules in the grammar file should return a primitive double value. You can do that by adding returns [double value] after each rule:

    grammar Exp;
    
    eval returns [double value]
        :    additionExp
        ;
    
    additionExp returns [double value]
        :    multiplyExp 
             ( '+' multiplyExp 
             | '-' multiplyExp
             )* 
        ;
    
    // ...
    

    which needs little explanation: every rule is expected to return a double value. Now to "interact" with the return value double value (which is NOT inside a plain Java code block {...}) from inside a code block, you'll need to add a dollar sign in front of value:

    grammar Exp;
    
    /* This will be the entry point of our parser. */
    eval returns [double value]                                                  
        :    additionExp { /* plain code block! */ System.out.println("value equals: "+$value); }
        ;
        
    // ...
    

    Here's the grammar but now with the Java code added:

    grammar Exp;
    
    eval returns [double value]
        :    exp=additionExp {$value = $exp.value;}
        ;
    
    additionExp returns [double value]
        :    m1=multiplyExp       {$value =  $m1.value;} 
             ( '+' m2=multiplyExp {$value += $m2.value;} 
             | '-' m2=multiplyExp {$value -= $m2.value;}
             )* 
        ;
    
    multiplyExp returns [double value]
        :    a1=atomExp       {$value =  $a1.value;}
             ( '*' a2=atomExp {$value *= $a2.value;} 
             | '/' a2=atomExp {$value /= $a2.value;}
             )* 
        ;
    
    atomExp returns [double value]
        :    n=Number                {$value = Double.parseDouble($n.text);}
        |    '(' exp=additionExp ')' {$value = $exp.value;}
        ;
    
    Number
        :    ('0'..'9')+ ('.' ('0'..'9')+)?
        ;
    
    WS  
        :   (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;}
        ;
    

    and since our eval rule now returns a double, change your ANTLRDemo.java into this:

    import org.antlr.runtime.*;
    
    public class ANTLRDemo {
        public static void main(String[] args) throws Exception {
            ANTLRStringStream in = new ANTLRStringStream("12*(5-6)");
            ExpLexer lexer = new ExpLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            ExpParser parser = new ExpParser(tokens);
            System.out.println(parser.eval()); // print the value
        }
    }
    

    Again (re) generate a fresh lexer and parser from your grammar (1), compile all classes (2) and run ANTLRDemo (3):

    // *nix/MacOS
    java -cp antlr-3.2.jar org.antlr.Tool Exp.g   // 1
    javac -cp .:antlr-3.2.jar ANTLRDemo.java      // 2
    java -cp .:antlr-3.2.jar ANTLRDemo            // 3
    
    // Windows
    java -cp antlr-3.2.jar org.antlr.Tool Exp.g   // 1
    javac -cp .;antlr-3.2.jar ANTLRDemo.java      // 2
    java -cp .;antlr-3.2.jar ANTLRDemo            // 3
    

    and you'll now see the outcome of the expression 12*(5-6) printed to your console!

    Again: this is a very brief explanation. I encourage you to browse the ANTLR wiki and read some tutorials and/or play a bit with what I just posted.

    Good luck!

    EDIT:

    This post shows how to extend the example above so that a Map<String, Double> can be provided that holds variables in the provided expression.

    To get this code working with a current version of Antlr (June 2014) I needed to make a few changes. ANTLRStringStream needed to become ANTLRInputStream, the returned value needed to change from parser.eval() to parser.eval().value, and I needed to remove the WS clause at the end, because attribute values such as $channel are no longer allowed to appear in lexer actions.