Search code examples
javaantlrantlr4grammarcontext-free-grammar

How to make parser decide on which alternative to use, based on the rule in the previous step


I'm using ANTLR 4 to parse a protocol's messages, let's name it 'X'. Before extracting a message's information , I have to check if it complies with X's rules.

Suppose we have to parse X's 'FOO' message that follows the following rules:

  1. Message starts with the 'messageIdentifier' that consists of the 3-letter reserved word FOO.
  2. Message contains 5 fields, of which the first 2 are mandatory (must be included) and the rest 3 are optional (can be not included).
  3. Message's fields are separated by the character '/'. If there is no information in a field (that means that the field is optional and is omitted) the '/' character must be preserved. Optional fields and their associated filed separators '/' at the end of the message may be omitted where no further information within the message is reported.
  4. A message can expand in multiple lines. Each line must have at least one non-empty field (mandatory or optional). Moreover, each line must start with a '/' character and end with a non-empty field following a '\n' character. Exception is the first line that always starts with the reserved word FOO.
  5. Each message's field also has its own rules regarding the accepted tokens, which will be shown in the grammar below.

Sample examples of valid FOO messages:

  1. FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n

  2. FOO/MANDATORY_1/MANDATORY2\n

    /OPT 1\n

    /HELLO\n

    /100\n

  3. FOO/MANDATORY_1/MANDATORY2\n

  4. FOO/MANDATORY_1/MANDATORY2//HELLO/100\n

  5. FOO/MANDATORY_1/MANDATORY2///100\n

  6. FOO/MANDATORY_1/MANDATORY2/OPT 1\n

  7. FOO/MANDATORY_1/MANDATORY2 ///100\n

Sample examples of non-valid FOO messages:

  1. FOO\n

    /MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n

  2. FOO/MANDATORY_1/\n

    MANDATORY2/OPT 1/HELLO/100\n

  3. FOO/MANDATORY_1/MANDATORY2/OPT 1//\n

  4. FOO/MANDATORY_1/MANDATORY2/OPT 1/\n

    /100\n

  5. FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n

  6. FOO/MANDATORY_1/MANDATORY2/\n

  7. FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100

Below follows the grammar for the above message:

grammar Foo_Message


/* Parser Rules */

startRule : 'FOO' mandatoryField_1 ;

mandatoryField_1 : '/' field_1 NL? mandatoryField_2 ;

mandatoryField_2 : '/' field_2 NL? optionalField_3 ;

optionalField_3 : '/' field_3 NL? optionalField_4
                | '/' optionalField_4
                | optionalField_4
                ;

optionalField_4 : '/' field_4 NL? optionalField_5
                | '/' optionalField_5
                | optionalField_5
                ;

optionalField_5 : '/' field_5 NL?
                | NL
                ;

field_1 : (A | N | B | S)+ ;

field_2 : (A | N)+ ;

field_3 : (A | N | B)+ ;

field_4 : A+ ;

field_5 : N+ ;

/* Lexer Rules */

A : [A-Z]+ ;

N : [0-9]+ ;

B : ' ' -> skip ;

S : [*&@#-_<>?!]+ ;

NL : '\r'? '\n' ;

The above grammar parses correctly any input that complies with FOO message's rules. The problem resides in parsing a line that ends with the '/' character, which according to the protocol's FOO message's rules is an invalid input. I understand that the second alternatives of rules 'optionalField_3', 'optionalField_4' and 'optionalField_5' lead to this behavior but I can't figure out how to make a rule for this. Somehow I need the parser to remember that he came to 'optionalField_5' rule after seeing a non-omitted field in the previous rule, which if I am not mistaken can't be done in ANTLR as I can't check from which alternative of the previous rule I reached the current rule.

Is there a way to make the parser 'remember' this by some explicit option-rule? Or does my grammar need to be rearranged and if yes how?


Solution

  • Solution was to refactor my grammar to include rules for filledField and emptyField.

    kaby76's post is marked as an answer as it helped towards the solution.

    The refactored grammar:

    grammar Foo_Message
    
    
    /* Parser Rules */
    
    startRule : 'FOO' mandatoryField_1 endRule ;
    
    mandatoryField_1 : '/' field_1 NL? mandatoryField_2 ;
    
    mandatoryField_2 : '/' field_2 NL? (filledOptionalField_3 | emptyOptionalField_3 )? ;
    
    filledOptionalField_3 : '/' field_3 NL? (filledOptionalField_4 | emptyOptionalField_4)? ;
    emptyOptionalField_3 : '/' (filledOptionalField_4 | emptyOptionalField_4) ;
    
    filledOptionalField_4 : '/' field_4 NL? filledOptionalField_5? ;
    emptyOptionalField_4 : '/' filledOptionalField_5 ;
    
    filledOptionalField_5 : '/' field_5 ;
    
    endRule : NL;
    
    field_1 : (A | N | B | S)+ ;
    
    field_2 : (A | N)+ ;
    
    field_3 : (A | N | B)+ ;
    
    field_4 : A+ ;
    
    field_5 : N+ ;
    
    /* Lexer Rules */
    
    A : [A-Z]+ ;
    
    N : [0-9]+ ;
    
    B : ' ' -> skip ;
    
    S : [*&@#-_<>?!]+ ;
    
    NL : '\r'? '\n' ;