[Also posted on the Instaparse mailing list, but posted here as well since I'm guessing this is a fairly general problem]
Consider the grammar
D = (B|S)*
S = 'S' B*
B = 'B'
(This is Instaparse's version of BNF...)
B can occur by itself, or after S; if the latter, it should be considered part of the, er, S expression (no pun intended).
Example:
(-> "D = (B|S)*
S = 'S' B*
B = 'B'"
parser
(parses "BSBB"))
;;=>
([:D [:B "B"] [:S "S"] [:B "B"] [:B "B"]]
[:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]] ;; <------
[:D [:B "B"] [:S "S" [:B "B"]] [:B "B"]])
I'd like only the second result to match -- so that B gets included inside S when possible, and to remove the other options. What needs to be done to my parser to make this change?
More example expressions shown in this gist.
You can use negative lookahead to postulate that matches of S
must not be followed by valid B
s:
(-> "
D = (B|S)*
S = 'S' B* !B
B = 'B'
"
insta/parser
(insta/parses "BSBB"))
;= ([:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]])
This works for all the examples in (the current version of) your gist as well.