I'm using grako (a PEG parser generator library for python) to parse a simple declarative language where a document can contain one or more protocols.
Originally, I had the root rule for document written as:
document = {protocol}+ ;
This appropriately returns a list of protocols, but only gives helpful errors if a syntax error is in the first protocol. Otherwise, it silently discards the invalid protocol and everything after it.
I have also tried a few variations on:
document = protocol document | $ ;
But this doesn't result in a list if there's only one protocol, and doesn't give helpful error messages either, saying only no available options: (...) document
if any of the protocols contains an error.
How do I write a rule that does both of the following?:
This is the solution:
document = {protocol ~ }+ $ ;
If you don't add the $
for the parser to see the end of file, the parse will succeed with one or more protocol, even if there are more to parse.
Adding the cut expression (~
) makes the parser commit to what was parsed in the closest option/choice in the parse (a closure is an option of X = a X|();
). Additional cut expressions within what's parsed by protocol
will make the error messages be closer to the expected points of failure in the input.