Search code examples
gold-parser

BNF grammar + Gold LALR parser, failing to distinguish special case NewLine from Whitespace


  • I want to consider whitespaces and newlines as normal whitespaces.
  • I want to distinguish newlines from other whitespaces moreover to allow special case.

First attempt to write a compliant grammar fails.

Here is the grammar:

! ------------------------------------------------- Sets

{WS}           = {Whitespace} - {CR} - {LF}
{ID Head}      = {Letter} + [_]
{ID Tail}      = {Alphanumeric} + [_]
{String Chars} = {Printable} + {HT} - ["\]

! ------------------------------------------------- Terminals

! The following defines the Whitespace terminal using the {WS}
! set - which excludes the carriage return and line feed 
! characters

Whitespace    = {WS}+ | {CR}{LF} | {CR} | {LF}
!NewLine       = {CR}{LF} | {CR} | {LF}
MyNewLine      = {CR}{LF} | {CR} | {LF}

Solution

  • They are ambiguous because they both contain the same sub-set {CR}{LF} | {CR} | {LF}.

    Given the input {CR}{LF} the parser has no way to tell which terminal it should match.

    A table-driven parser isn't really designed to handle "special cases" directly. If you want to ignore new-lines in some contexts, but attribute meaning to them in others then you'll have to handle that in your reductions (i.e. tokenize the newlines separately, and discard them in your reductions), but that will get ugly.

    A (potentially) better solution is to use tokenizer states (possibly controlled from the parser), to change how the newline inputs are tokenized. It's hard to say without fully understanding your grammar. Plus, it's been a few years since I've messed with this stuff.