I'm using ANTLR 4 to parse a protocol's messages, let's name it 'X'. Before extracting a message's information , I have to check if it complies with X's rules.
Suppose we have to parse X's 'FOO' message that follows the following rules:
Sample examples of valid FOO messages:
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2\n
/OPT 1\n
/HELLO\n
/100\n
FOO/MANDATORY_1/MANDATORY2\n
FOO/MANDATORY_1/MANDATORY2//HELLO/100\n
FOO/MANDATORY_1/MANDATORY2///100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1\n
FOO/MANDATORY_1/MANDATORY2 ///100\n
Sample examples of non-valid FOO messages:
FOO\n
/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/\n
MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1//\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/\n
/100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2/\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100
Below follows the grammar for the above message:
grammar Foo_Message
/* Parser Rules */
startRule : 'FOO' mandatoryField_1 ;
mandatoryField_1 : '/' field_1 NL? mandatoryField_2 ;
mandatoryField_2 : '/' field_2 NL? optionalField_3 ;
optionalField_3 : '/' field_3 NL? optionalField_4
| '/' optionalField_4
| optionalField_4
;
optionalField_4 : '/' field_4 NL? optionalField_5
| '/' optionalField_5
| optionalField_5
;
optionalField_5 : '/' field_5 NL?
| NL
;
field_1 : (A | N | B | S)+ ;
field_2 : (A | N)+ ;
field_3 : (A | N | B)+ ;
field_4 : A+ ;
field_5 : N+ ;
/* Lexer Rules */
A : [A-Z]+ ;
N : [0-9]+ ;
B : ' ' -> skip ;
S : [*&@#-_<>?!]+ ;
NL : '\r'? '\n' ;
The above grammar parses correctly any input that complies with FOO message's rules. The problem resides in parsing a line that ends with the '/' character, which according to the protocol's FOO message's rules is an invalid input. I understand that the second alternatives of rules 'optionalField_3', 'optionalField_4' and 'optionalField_5' lead to this behavior but I can't figure out how to make a rule for this. Somehow I need the parser to remember that he came to 'optionalField_5' rule after seeing a non-omitted field in the previous rule, which if I am not mistaken can't be done in ANTLR as I can't check from which alternative of the previous rule I reached the current rule.
Is there a way to make the parser 'remember' this by some explicit option-rule? Or does my grammar need to be rearranged and if yes how?
Solution was to refactor my grammar to include rules for filledField
and emptyField
.
kaby76's post is marked as an answer as it helped towards the solution.
The refactored grammar:
grammar Foo_Message
/* Parser Rules */
startRule : 'FOO' mandatoryField_1 endRule ;
mandatoryField_1 : '/' field_1 NL? mandatoryField_2 ;
mandatoryField_2 : '/' field_2 NL? (filledOptionalField_3 | emptyOptionalField_3 )? ;
filledOptionalField_3 : '/' field_3 NL? (filledOptionalField_4 | emptyOptionalField_4)? ;
emptyOptionalField_3 : '/' (filledOptionalField_4 | emptyOptionalField_4) ;
filledOptionalField_4 : '/' field_4 NL? filledOptionalField_5? ;
emptyOptionalField_4 : '/' filledOptionalField_5 ;
filledOptionalField_5 : '/' field_5 ;
endRule : NL;
field_1 : (A | N | B | S)+ ;
field_2 : (A | N)+ ;
field_3 : (A | N | B)+ ;
field_4 : A+ ;
field_5 : N+ ;
/* Lexer Rules */
A : [A-Z]+ ;
N : [0-9]+ ;
B : ' ' -> skip ;
S : [*&@#-_<>?!]+ ;
NL : '\r'? '\n' ;