Search code examples
parsingantlrconvertersgrammarebnf

EBNF grammar to ANTLR3?


I have this EBNF grammar for the Jass scripting language.
What needs to be done to convert it to work with ANTLR 3.5?
Furthermore, are there any sort of tools available to aid me in doing so?

//----------------------------------------------------------------------
// Global Declarations
//----------------------------------------------------------------------
program  ::= file+
file     ::= newline? ( declr newline )* func*
declr    ::= typedef
           | globals
           | native_func
typedef  ::= 'type' id 'extends' ( 'handle' | id )
globals  ::= 'globals' newline global_var_list 'endglobals'
global_var_list
         ::= ( 'constant' type id '=' expr newline | var_declr newline )*
native_func
         ::= 'constant'? 'native' func_declr
func_declr
         ::= id 'takes' ( 'nothing' | param_list ) 'returns' ( type | 'nothing' )
param_list
         ::= type id ( ',' type id )*
func     ::= 'constant'? 'function' func_declr newline local_var_list statement_list 'endfunction' newline

//----------------------------------------------------------------------
// Local Declarations
//----------------------------------------------------------------------
local_var_list
         ::= ( 'local' var_declr newline )*
var_declr
         ::= type id ( '=' expr )?
           | type 'array' id
statement_list
         ::= ( statement newline )*
statement
         ::= set
           | call
           | ifthenelse
           | loop
           | exitwhen
           | return
           | debug
set      ::= 'set' id '=' expr
           | 'set' id '[' expr ']' '=' expr
call     ::= 'call' id '(' args? ')'
args     ::= expr ( ',' expr )*
ifthenelse
         ::= 'if' expr 'then' newline statement_list else_clause? 'endif'
else_clause
         ::= 'else' newline statement_list
           | 'elseif' expr 'then' newline statement_list else_clause?
loop     ::= 'loop' newline statement_list 'endloop'
exitwhen ::= 'exitwhen' expr
return   ::= 'return' expr?
debug    ::= 'debug' ( set | call | ifthenelse | loop )

//----------------------------------------------------------------------
// Expressions
//----------------------------------------------------------------------
expr     ::= binary_op
           | unary_op
           | func_call
           | array_ref
           | func_ref
           | id
           | const
           | parens
binary_op
         ::= expr ( [+-*/><] | '==' | '!=' | '>=' | '<=' | 'and' | 'or' ) expr
unary_op ::= ( '+' | '-' | 'not' ) expr
func_call
         ::= id '(' args? ')'
array_ref
         ::= id '[' expr ']'
func_ref ::= 'function' id
const    ::= int_const
           | real_const
           | bool_const
           | string_const
           | 'null'
int_const
         ::= decimal
           | octal
           | hex
           | fourcc
decimal  ::= [1-9] [0-9]*
octal    ::= '0' [0-7]*
hex      ::= '$' [0-9a-fA-F]+
           | '0' [xX] [0-9a-fA-F]+
fourcc   ::= '' ' .{4} ' ''
real_const
         ::= [0-9]+ '.' [0-9]*
           | '.' [0-9]+
bool_const
         ::= 'true'
           | 'false'
string_const
         ::= '"' .* '"'
parens   ::= '(' expr ')'

//----------------------------------------------------------------------
// Base RegEx
//----------------------------------------------------------------------
type     ::= id
           | 'code'
           | 'handle'
           | 'integer'
           | 'real'
           | 'boolean'
           | 'string'
id       ::= [a-zA-Z] ( [a-zA-Z0-9_]* [a-zA-Z0-9] )?
newline  ::= '\n'+


Thanks in advance to any advice offered!


Solution

  • Disclaimer: I don't actually use ANTLR, so someone that does might come by with more detailed information.

    ANTLR generates recursive descent parsers, so your grammar will have to be refactored to eliminate left recursion, which you have e.g. in expr:

    expr     ::= binary_op
    ...
    binary_op
             ::= expr ( [+-*/><] | '==' | '!=' | '>=' | '<=' | 'and' | 'or' ) expr
    

    While parsing expr, the parser would try binary_op as an option, encounter another expr, then try to parse that recursively without having consumed any input, and you would now have infinite recursion.

    This is usually dealt with by reformulating the grammar along the lines of

    expr     ::= binary_op
    ...
    binary_op
             ::= term ( [+-] term )
    
    term = factor ( [*/] factor)
    
    factor = id
             | const
             | parens
             ...
    

    and so on.

    Not a trivial process, but not impossible to do either.