Search code examples
javajvmantlrgrammarparser-generator

Antlworks grammar parser


I created a simple grammar in AntlWorks. Then I generated code and I have two files: grammarLexer.java and grammarParser.java. My goal is to create mapping my grammar to java language. What should I do next to achieve it?

Here is my grammar: ` grammar grammar; prog : ((FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | VARIABLE) | FUNCTION_DEC)+;

FOR        :     WS* 'for' WS+ VARIABLE WS+ DIGIT+ WS+ DIGIT+ WS* ENTER  ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER;
WHILE        :     WS* 'while' WS+ (VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER  (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))* WS* 'end' WS* ENTER;
IF        :         WS* 'if' WS+ ( FUNCTION | VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC)* ( WS* 'else' ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))*)? WS* 'end' WS* ENTER;

CHAR        :     ('a'..'z'|'A'..'Z')+;
EQ_OPERATOR    :    ('<' | '>' | '==' | '>=' | '<=' | '!=');
DIGIT        :     '0'..'9'+;
ENTER        :     '\n';
WS        :     ' ' | '\t';

PRINT_TEMPLATE  :     WS+ (('"' (CHAR | DIGIT | WS)* '"') | VARIABLE | DIGIT+ | FUNCTION | INC_DEC);
PRINT             :     WS* 'print' PRINT_TEMPLATE (',' PRINT_TEMPLATE)*  WS* ENTER;

VARIABLE        :    CHAR(CHAR|DIGIT)*;
FUN_TEMPLATE    :    WS* (VARIABLE | DIGIT+ | '"' (CHAR | DIGIT | WS)* '"');
FUNCTION        :    VARIABLE '(' (FUN_TEMPLATE (WS* ',' FUN_TEMPLATE)*)? ')' WS* ENTER*;

DECLARATION     :    WS* VARIABLE WS* ('=' WS* (DIGIT+ | '"' (CHAR | DIGIT | WS)* '"' | VARIABLE)) WS* ENTER;
FUNCTION_DEC    :    WS*'def' WS* FUNCTION ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER*;

INC_DEC            :    VARIABLE ('--' | '++') WS* ENTER*;`

Here is my Main class for parser: `
import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CommonToken; import org.antlr.runtime.CommonTokenStream; import org.antlr.runtime.Parser;

public class Main {
    public static void main(String[] args) throws Exception {  
        // the input source  
        String source =   
            "for i 1 3\n " +
            "printHi()\n " +
            "end\n " +
            "if fun(y, z) == 0\n " +
            "end\n ";
// create an instance of the lexer  
         grammarLexer lexer = new grammarLexer(new ANTLRStringStream(source));  

         // wrap a token-stream around the lexer  
         CommonTokenStream tokens = new CommonTokenStream(lexer);  

         // traverse the tokens and print them to see if the correct tokens are created  
         int n = 1;  
         for(Object o : tokens.getTokens()) {  
           CommonToken token = (CommonToken)o;  
           System.out.println("token(" + n + ") = " + token.getText().replace("\n", "\\n"));  
           n++;  
         }
         grammarParser parser = new grammarParser(tokens);
         parser.file();
}
}
`

Solution

  • As I already mentioned in comments: your overuse of lexer rules is wrong. Look at lexer rules as being the fundamental building blocks of your language. Much like how you'd describe water in chemistry. You would not describe water like this:

    WATER
     : 'HHO'
     ;
    

    I.e.: as a single element. Water should be described as 3 separate elements:

    water
     : Hydrogen Hydrogen Oxygen
     ;
    
    Hydrogen : 'H';
    Oxygen   : 'O';
    

    where Hydrogen and Oxygen are the fundamental building blocks (lexer rules) and water is the compound (the parser rule).

    A good rule of thumb is that if you're creating lexer rules that consist of several other lexer rules, chances are there's something fishy in your grammar. This is not always the case, of course.

    Let's say you want to parse the following input:

    for i 1 3
      print(i)
    end
    
    if fun(y, z) == 0
      print('foo')
    end
    

    A grammar could look like this:

    grammar T;
    
    options {
      output=AST;
    }
    
    tokens {
      BLOCK;
      CALL;
      PARAMS;
    }
    
    // parser rules
    parse
     : block EOF!
     ;
    
    block
     : stat* -> ^(BLOCK stat*)
     ;
    
    stat
     : for_stat
     | if_stat
     | func_call
     ;
    
    for_stat
     : FOR^ ID expr expr block END!
     ;
    
    if_stat
     : IF^ expr block END!
     ;
    
    expr
     : eq_expr
     ;
    
    eq_expr
     : atom (('==' | '!=')^ atom)*
     ;
    
    atom
     : func_call
     | INT
     | ID
     | STR
     ;
    
    func_call
     : ID '(' params ')' -> ^(CALL ID params)
     ;
    
    params
     : (expr (',' expr)*)? -> ^(PARAMS expr*)
     ;
    
    // lexer rules
    FOR : 'for';
    END : 'end';
    IF  : 'if';
    ID  : ('a'..'z' | 'A'..'Z')+;
    INT : '0'..'9'+;
    STR : '\'' ~('\'')* '\'';
    SP  : (' ' | '\t' | '\r' | '\n')+ {skip();};
    

    And if you now run this test class:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.*;
    import org.antlr.stringtemplate.*;
    
    public class Main {
      public static void main(String[] args) throws Exception {
        String src = 
            "for i 1 3          \n" + 
            "  print(i)         \n" + 
            "end                \n" + 
            "                   \n" + 
            "if fun(y, z) == 0  \n" + 
            "  print('foo')     \n" + 
            "end                \n";
        TLexer lexer = new TLexer(new ANTLRStringStream(src));
        TParser parser = new TParser(new CommonTokenStream(lexer));
        CommonTree tree = (CommonTree)parser.parse().getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
      }
    }
    

    you'll see some output being printed to the console which corresponds to the following AST:

    enter image description here