Search code examples
parsingsyntaxebnf

Handling whitespace in EBNF


Let's say I have the following EBNF defined for a simpler two-term adder:

<expression>    ::= <number> <plus> <number>
<number>        ::= [0-9]+
<plus>          ::= "+"

Shown here.

What would be the proper way to allow any amount of whitespace except a newline/return between the terms? For example to allow:

1 + 2
1 <tab> + 2
1           + 2

etc.

For example, doing something like the following fails:

<whitespace>::= " " | \t

Furthermore, it seems (almost) every term would be preceded and followed by an optional space. Something like:

<plus>          ::= <whitespace>? "+" <whitespace>?

How would that be properly addressed?


Solution

  • The XML standard, as an example, uses the following production for whitespace:

    S ::= (#x20 | #x9 | #xD | #xA)+
    

    You could omit CR (#xD) and LF (#xA) if you don't want those.

    Regarding your observation that grammars could become overwhelmed by whitespace non-terminals, note that whitespace handling can be done in lexical analysis rather than in parsing. See EBNF Grammar for list of words separated by a space.