Search code examples
javaparsingantlrgrammarantlr3

Parsing Union[Dict[str,str],Dict[str,str]] with antlr 3


I am trying to make a parser for strings such as "Union[Dict[str,str],Dict[str,str]]" with antlr3. Below is the parser grammar that I use to generate the parser.

grammar PyType;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
    OPEN_SQ_BR = '[';
    CLOSE_SQ_BR = ']';
    LIST = 'List';
    SET = 'Set';
    UNION = 'Union';
    DICT = 'Dict';
    TUPLE = 'Tuple';
    COMMA = ',';
 /*   Nothing = 'nothing'; */
    OPTIONAL = 'Optional';
    HYPHEN = '-' ;
    UNDERSCORE = '_' ;
    DOT = '\.';
}


/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
parse
    :  expr
    ;

list_element
    : OPEN_SQ_BR expr CLOSE_SQ_BR -> expr
    ;

union_element
    : OPEN_SQ_BR (expr COMMA)+ CLOSE_SQ_BR -> expr+;

list_expr
    : LIST^ list_element*;

set_expr
    : SET^ list_element*;

union_expr
    : UNION^ union_element;

dict_expr
    : DICT^ union_element;

tuple_expr
    : TUPLE^ union_element;

optional_expr
    : OPTIONAL^ union_element;


DIGIT  : '0'..'9' ;

LETTER : 'a'..'z' |'A'..'Z'|'0'..'9'|'_' ;

NUMBER : DIGIT+ ;

SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.')*('a'..'z'|'A'..'Z'|'_'|'0'..'9')
           ;


expr : list_expr
    | set_expr
    | SimpleType
    | union_expr
    | dict_expr
    | tuple_expr
    | optional_expr;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

Following strings are parsed correctly with the above grammer.

  1. Union[Dict[str,str]]
  2. Dict[str,str]
  3. List[str]

However, when I have more than one Union, Dict, or Tuple inside Union, Dict, or Tuple it does not parse correctly. For example Union[Dict[str,str],Dict[str,str]] does not parse correctly.

Could someone please help me to spot the error in the gramar?


Solution

  • Your rule:

    union_element
        : OPEN_SQ_BR (expr COMMA)+ CLOSE_SQ_BR -> expr+
        ;
    

    can't be right: it says the expr must always end with a ,, causing it not to match Union[Dict[str,str]] (and all other input examples you mentioned as far as I can see) but matches things like Union[Dict[str,str,],] instead.

    You should do:

    union_element
        : OPEN_SQ_BR expr (COMMA expr)* CLOSE_SQ_BR -> expr+
        ;
    

    With that change, I think input like Union[Dict[str,str],Dict[str,str]] will also be matched properly.