I'm trying to write a RACC parser, a part of which can be represented by the regular expression a[b][c][d]
.
I came up with the following productions (each token represents the lower case character of the token name):
expr
: A
| A b
;
b
: B
| B c
| c
;
c
: C
| C D
| D
Is this the simplest form, or am I missing something?
If B
. C
, and D
can be distinguished from each other by their first token (which is certainly the case if they are tokens rather than being the simplification of a more complicated grammar), then an alternative is to define non-terminals representing optionality.
optional_B: /* empty */ | B
optional_C: /* empty */ | C
optional_D: /* empty */ | D
expression: A optional_B optional_C optional_D
This gets more complicated when the optional subexpressions cannot be immediately distinguished, because the parser needs to correctly identify that the optional non-terminal matches the empty string based only on the following token. But it sounds like that's not the case with your grammar.