Search code examples
clojuregrammarinstaparse

Resolving ambiguity in simple Instaparse grammar


[Also posted on the Instaparse mailing list, but posted here as well since I'm guessing this is a fairly general problem]

Consider the grammar

 D = (B|S)*
 S = 'S' B*
 B = 'B'

(This is Instaparse's version of BNF...)

B can occur by itself, or after S; if the latter, it should be considered part of the, er, S expression (no pun intended).

Example:

(-> "D = (B|S)*
     S = 'S' B*
     B = 'B'"
    parser
    (parses "BSBB"))

;;=>
([:D [:B "B"] [:S "S"] [:B "B"] [:B "B"]]
 [:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]]    ;; <------
 [:D [:B "B"] [:S "S" [:B "B"]] [:B "B"]])

I'd like only the second result to match -- so that B gets included inside S when possible, and to remove the other options. What needs to be done to my parser to make this change?

More example expressions shown in this gist.


Solution

  • You can use negative lookahead to postulate that matches of S must not be followed by valid Bs:

    (-> "
    
    D = (B|S)*
    S = 'S' B* !B
    B = 'B'
    
    "
    insta/parser
    (insta/parses "BSBB"))
    ;= ([:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]])
    

    This works for all the examples in (the current version of) your gist as well.