Search code examples
parsingantlrgrammarbnfebnf

EBNF with comma repetition


I find myself doing the following quite frequently as allowing one multiple entries separated by a comma:

( function | expression ) ( ',' ( function | expression ))*

Is there a more compact way to do this? Ideally I'd just like to be able to do something along the lines of:

( function | expression ) [,...]

Or:

( function | expression ',')*

By the way, I am using this as a validator: https://www.bottlecaps.de/rr/ui#_Production


The whole grammar I am trying to 'clean up' is the following:

AGGREGATION
  ::= 'GROUP BY' ( GROUPING_ROWS | PIVOT )?

PIVOT
  ::= 'PIVOT(' AXIS_EXPR (AXIS_EXPR ',' )? ')'

AXIS_EXPR
  ::= expr ( 'AS'? alias )? 'ON' ( 'ROWS' | 'COLS' ) ( 'HAVING' expr )? ( 'ORDER BY' expr ( 'ASC' | 'DESC' )? )? ( 'LIMIT' num 'PERCENT'? )?

GROUPING_ROWS
  ::= 'GROUPING_ROWS(' GROUPING_EXPR (GROUPING_EXPR ',' )? ')'

GROUPING_EXPR
  ::= NAME_OR_POS 'SUBTOTAL' 'S'? GROUPING_EXPR_SUBTOTAL (',' GROUPING_EXPR_SUBTOTAL)*

GROUPING_EXPR_SUBTOTAL
  ::= NAME_OR_POS ':'  AGGREGATED_CALCULATION ( ',' AGGREGATED_CALCULATION )*

NAME_OR_POS
  ::= ( name | pos )

AGGREGATED_CALCULATION
  ::= ( aggregation_function | aggregation_expression ) ( 'AS'? alias)?

And as an example of the construct I find myself using all the time:

enter image description here


Solution

  • ( function | expression ) ( ',' ( function | expression ))*
    

    Is there a more compact way to do this?

    Other than introducing "helper rules" like this:

    rule
     : atom_list
     ;
    
    atom_list
     : atom (',' atom)*
     ;
    
    atom
     : function
     | expression
     ;
    

    the answer is: no, there is no shorter way to write a (',' a)* into something like (a ',')* with ANTLR.

    If you're repeating function | expression a lot, at the very least make a separate rule of those alternatives.