algorithm parsing grammar ambiguous-grammar

One-Character parenthesis matching

Given the grammar rule (BNF, | means or):

x := a | x x | x + x | x + "x" | "x" + x | "x" + "x"

, with

+ left-associative (a+a+a means (a+a)+a),
concatenation left-associative (aaa means (aa)a, not a(aa)),
and + lazily eating operands (aa+aa means a(a+a)a).

Problem: Is this grammar ambiguous? I.e. is it possible to parse a string in two different ways?

Examples:

Allowed: a, a+a, a+"a", "a+a"+"a+a" (read as (a+a)+(a+a)), ""a"+"a""+"a" (read as ((a)+(a))+(a)), a+a+a, a+"a"+a.

Forbidden: "a+a", +"a", a++a, "a", a+"a, ""a+a"+a".

Application: I hate to escape { and } in LaTeX, so I wanted to make a LaTeX dialect in which only one character needs to be escaped, thus replace both { and } by one character " for example, and write something like ""1+2"/3"^"a+b" instead of {\frac{1+2}{3}}^{a+b}.

Solution

Here is a a quick and dirty script using Marpa::R2, a Perl interface to Marpa, a general BNF parser to parse the inputs with the grammar you've provided and its modified version, which supports lazy eating and left assoc, but doesn't forbid "a": code, output.

The grammar is not ambiguous for the inputs you've provided as parse() would throw an exception otherwise.

Hope this helps.

P.S. Using Marpa's general BNF parsing capability to provide a frontend with better syntax for TeX (among others) was discussed in the Marpa community.

update: re asker's comment

This grammar (in Marpa SLIF DSL, || means lower precedence)

x ::= a
   ||    x     '+'     x
   |     x     '+' '"' x '"'
   | '"' x '"' '+'     x
   | '"' x '"' '+' '"' x '"'
   ||    x             x

unambigously parses the inputs in the question except "a+a"+"a+a", for which "x" alternative can be needed (which will make the grammar ambiguous, as rici helpfully suggests in the comment below, more on that in the next para): code, output.

Overall, with double quotes " serving as parens, '+' as, well, plus, it is tempting to add a sign for an op with lower precedence than '+', e.g. '.' for concatenation and make it a classic term/factor grammar, which can be expressed as follows in Marpa SLIF DSL:

x ::= a
  || '"' x '"' assoc => group
  || x '+' x
  || x '.' x

Update 1:

# input: "a+a"+"a+a"
Setting trace_terminals option
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c2 e2: a; value="a"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c4 e4: a; value="a"
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c7 e7: '"'; value="""
Lexer "L0" accepted lexeme L1c8 e8: a; value="a"
Error in SLIF parse: No lexeme found at line 1, column 9
* String before error: "a+a"+"a
* The error was at line 1, column 9, and at character 0x002b '+', ...
* here: +a"
Marpa::R2 exception at C:\cygwin\home\Ruslan\Marpa-R2-work\q27655176.t line 63.

Progress report is:
F3 @7-8 L1c7-8 x -> a .
R7:6 @0-8 L1c1-8 x -> '"' x '"' '+' '"' x . '"'
# ast dump:
undef