Search code examples
parsingbnf

Backus-Naur form of a token that begins and ends with double quotes


Is it possible to write a Backus-Naur form description of a token that:

  • starts with double quotes
  • ends with double quotes
  • can contain inner double quotes

?

For example I have programming construct called MESSAGE that can look like:

MESSAGE "She said "Hello" to him"

Is that possible to describe with an LL(1) Backus-Naur form?

Edits: let's assume that I'm parsing by simply iterating over each character, one at a time, with one lookahead pointer.

The input string is the whole thing, including MESSAGE.


Solution

  • Something like this might do the trick (Sorry for the syntax. It's a while ago, that I worked with BNF).

    message : MESSAGE message_text
    message_text : " inner_message " | message_text inner_message " inner_message "
    inner_message : [a-z0-9...]+
    

    Your example MESSAGE "She said "Hello" to him" should be interpreted in two steps: First, the second form of message_text gets the hello " to him" and than, secondly, the first form of message_text gets "She said ".

    inner_message must contain all possible characters and spaces.

    As mentioned in the comments, the 'magic' lies in the second definition of message_text. As BNF has no concept of loops or if's, recursion is the only way to describe repetitions. One just need to ensure, that the syntax does not get ambiguous, means that for each possible input only one valid path exists (or none, which declares an error). The above syntax should work except for empty middle strings like in " he replied with "" silence".