Search code examples
pythonbnf

The "or", (|) in BNF Grammar


I can't seem to fully understand the application of the "or" in BNF Grammar which is denoted by the vertical bar symbol (|). A good example of what gets me confused is the description of string literals in The Python Language Reference. (I've deleted part of the description which is irrelevant to the question):

stringliteral   ::=  [stringprefix](shortstring | longstring)
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
shortstringitem ::=  shortstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
stringescapeseq ::=  "\" <any source character>

So, the way I understand the description of <shortstringitem> is that it can be <shortstringchar> OR <stringecapeseq>. Does this mean it cannot be both at the same time? If I am not mistaken a single string may contain both at the same time... (For clarity <shortstingchar> as I understand it is the text of my string)

Thank you.

Searched the web, including stackoverflow and watched explanatory videos but all seem to describe the "or" with something like:

<letter> ::= A|B|C|D|E...Y|Z.

Without going in too deep with the examples... Unfortunately this does not answer my question.


Solution

  • One shortstringitem can only be one or the other. But a shortstring can consist of multiple shortstringitems, each of which is "expanded" independently.

    Consider 'x\n', for example, which you could parse as

    'x\n' -> stringliteral
          -> shortstring
          -> "'"  shortstringitem shortstringitem "'"
          -> "'" shortstringchar stringescapeseq "'"
          -> "'" 'x' '\' 'n' "'"
    

    The first shortstringitem is recognized as a shortstringchar, the second as a stringescapeseq.