Search code examples
pythonparsingebnfpeggrako

Parse one or more expressions with helpful errors


I'm using grako (a PEG parser generator library for python) to parse a simple declarative language where a document can contain one or more protocols.

Originally, I had the root rule for document written as:

document = {protocol}+ ;

This appropriately returns a list of protocols, but only gives helpful errors if a syntax error is in the first protocol. Otherwise, it silently discards the invalid protocol and everything after it.

I have also tried a few variations on:

document = protocol document | $ ;

But this doesn't result in a list if there's only one protocol, and doesn't give helpful error messages either, saying only no available options: (...) document if any of the protocols contains an error.

How do I write a rule that does both of the following?:

  1. Always returns a list, even if there's only one protocol
  2. Displays helpful error messages about the unsuccessful match, instead of just saying it's an invalid document or silently dropping the damaged protocol

Solution

  • This is the solution:

    document = {protocol ~ }+ $ ;
    

    If you don't add the $ for the parser to see the end of file, the parse will succeed with one or more protocol, even if there are more to parse.

    Adding the cut expression (~) makes the parser commit to what was parsed in the closest option/choice in the parse (a closure is an option of X = a X|();). Additional cut expressions within what's parsed by protocol will make the error messages be closer to the expected points of failure in the input.