Let's say I have the following EBNF defined for a simpler two-term adder:
<expression> ::= <number> <plus> <number>
<number> ::= [0-9]+
<plus> ::= "+"
Shown here.
What would be the proper way to allow any amount of whitespace except a newline/return between the terms? For example to allow:
1 + 2
1 <tab> + 2
1 + 2
etc.
For example, doing something like the following fails:
<whitespace>::= " " | \t
Furthermore, it seems (almost) every term would be preceded and followed by an optional space. Something like:
<plus> ::= <whitespace>? "+" <whitespace>?
How would that be properly addressed?
The XML standard, as an example, uses the following production for whitespace:
S ::= (#x20 | #x9 | #xD | #xA)+
You could omit CR (#xD
) and LF (#xA
) if you don't want those.
Regarding your observation that grammars could become overwhelmed by whitespace non-terminals, note that whitespace handling can be done in lexical analysis rather than in parsing. See EBNF Grammar for list of words separated by a space.