Search code examples
algorithmparsinggrammarambiguous-grammar

One-Character parenthesis matching


Given the grammar rule (BNF, | means or):

x := a | x x | x + x | x + "x" | "x" + x | "x" + "x"

, with

  • + left-associative (a+a+a means (a+a)+a),
  • concatenation left-associative (aaa means (aa)a, not a(aa)),
  • and + lazily eating operands (aa+aa means a(a+a)a).

Problem: Is this grammar ambiguous? I.e. is it possible to parse a string in two different ways?

Examples:

Allowed: a, a+a, a+"a", "a+a"+"a+a" (read as (a+a)+(a+a)), ""a"+"a""+"a" (read as ((a)+(a))+(a)), a+a+a, a+"a"+a.

Forbidden: "a+a", +"a", a++a, "a", a+"a, ""a+a"+a".

Application: I hate to escape { and } in LaTeX, so I wanted to make a LaTeX dialect in which only one character needs to be escaped, thus replace both { and } by one character " for example, and write something like ""1+2"/3"^"a+b" instead of {\frac{1+2}{3}}^{a+b}.


Solution

  • Here is a a quick and dirty script using Marpa::R2, a Perl interface to Marpa, a general BNF parser to parse the inputs with the grammar you've provided and its modified version, which supports lazy eating and left assoc, but doesn't forbid "a": code, output.

    The grammar is not ambiguous for the inputs you've provided as parse() would throw an exception otherwise.

    Hope this helps.

    P.S. Using Marpa's general BNF parsing capability to provide a frontend with better syntax for TeX (among others) was discussed in the Marpa community.

    update: re asker's comment

    This grammar (in Marpa SLIF DSL, || means lower precedence)

    x ::= a
       ||    x     '+'     x
       |     x     '+' '"' x '"'
       | '"' x '"' '+'     x
       | '"' x '"' '+' '"' x '"'
       ||    x             x
    

    unambigously parses the inputs in the question except "a+a"+"a+a", for which "x" alternative can be needed (which will make the grammar ambiguous, as rici helpfully suggests in the comment below, more on that in the next para): code, output.

    Overall, with double quotes " serving as parens, '+' as, well, plus, it is tempting to add a sign for an op with lower precedence than '+', e.g. '.' for concatenation and make it a classic term/factor grammar, which can be expressed as follows in Marpa SLIF DSL:

    x ::= a
      || '"' x '"' assoc => group
      || x '+' x
      || x '.' x
    

    Update 1:

    # input: "a+a"+"a+a"
    Setting trace_terminals option
    Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
    Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
    Lexer "L0" accepted lexeme L1c2 e2: a; value="a"
    Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
    Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
    Lexer "L0" accepted lexeme L1c4 e4: a; value="a"
    Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
    Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
    Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
    Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
    Lexer "L0" accepted lexeme L1c7 e7: '"'; value="""
    Lexer "L0" accepted lexeme L1c8 e8: a; value="a"
    Error in SLIF parse: No lexeme found at line 1, column 9
    * String before error: "a+a"+"a
    * The error was at line 1, column 9, and at character 0x002b '+', ...
    * here: +a"
    Marpa::R2 exception at C:\cygwin\home\Ruslan\Marpa-R2-work\q27655176.t line 63.
    
    Progress report is:
    F3 @7-8 L1c7-8 x -> a .
    R7:6 @0-8 L1c1-8 x -> '"' x '"' '+' '"' x . '"'
    # ast dump:
    undef