Given the grammar rule (BNF, |
means or):
x := a | x x | x + x | x + "x" | "x" + x | "x" + "x"
, with
+
left-associative (a+a+a
means (a+a)+a
), aaa
means (aa)a
, not a(aa)
), +
lazily eating operands (aa+aa
means a(a+a)a
). Problem: Is this grammar ambiguous? I.e. is it possible to parse a string in two different ways?
Examples:
Allowed: a
, a+a
, a+"a"
, "a+a"+"a+a"
(read as (a+a)+(a+a)
), ""a"+"a""+"a"
(read as ((a)+(a))+(a)
), a+a+a
, a+"a"+a
.
Forbidden: "a+a"
, +"a"
, a++a
, "a"
, a+"a
, ""a+a"+a"
.
Application: I hate to escape {
and }
in LaTeX, so I wanted to make a LaTeX dialect in which only one character needs to be escaped, thus replace both {
and }
by one character "
for example, and write something like ""1+2"/3"^"a+b"
instead of {\frac{1+2}{3}}^{a+b}
.
Here is a a quick and dirty script using Marpa::R2, a Perl interface to Marpa, a general BNF parser to parse the inputs with the grammar you've provided and its modified version, which supports lazy eating and left assoc, but doesn't forbid "a"
:
code, output.
The grammar is not ambiguous for the inputs you've provided as parse() would throw an exception otherwise.
Hope this helps.
P.S. Using Marpa's general BNF parsing capability to provide a frontend with better syntax for TeX (among others) was discussed in the Marpa community.
update: re asker's comment
This grammar (in Marpa SLIF DSL, || means lower precedence)
x ::= a
|| x '+' x
| x '+' '"' x '"'
| '"' x '"' '+' x
| '"' x '"' '+' '"' x '"'
|| x x
unambigously parses the inputs in the question except "a+a"+"a+a"
, for which "x"
alternative can be needed (which will make the grammar ambiguous, as rici helpfully suggests in the comment below, more on that in the next para): code, output.
Overall, with double quotes " serving as parens, '+' as, well, plus, it is tempting to add a sign for an op with lower precedence than '+', e.g. '.' for concatenation and make it a classic term/factor grammar, which can be expressed as follows in Marpa SLIF DSL:
x ::= a
|| '"' x '"' assoc => group
|| x '+' x
|| x '.' x
Update 1:
# input: "a+a"+"a+a"
Setting trace_terminals option
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c2 e2: a; value="a"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c4 e4: a; value="a"
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c7 e7: '"'; value="""
Lexer "L0" accepted lexeme L1c8 e8: a; value="a"
Error in SLIF parse: No lexeme found at line 1, column 9
* String before error: "a+a"+"a
* The error was at line 1, column 9, and at character 0x002b '+', ...
* here: +a"
Marpa::R2 exception at C:\cygwin\home\Ruslan\Marpa-R2-work\q27655176.t line 63.
Progress report is:
F3 @7-8 L1c7-8 x -> a .
R7:6 @0-8 L1c1-8 x -> '"' x '"' '+' '"' x . '"'
# ast dump:
undef