Is there an easy way to add custom error messages in pyparsing?

Suppose we have a grammar where there is an element that was expected but is not found. Assume that the backtracking is disabled, i.e. the element is preceded by a - sign.

Is there any way in the library to set a custom error message, such as "Foo was needed", on that element? Or is the only way to catch the parse exception and work it out using the location information?

Say:

from pyparsing import *

grammar = (
    Literal("//")
    - Word(alphanums)("name")
    + Suppress(White()[1,...])
    + Word(alphanums)("operation")
).leave_whitespace()

grammar.run_tests("""
//name op
// opnoname
""", print_results=True)

The output for the 2nd line is:

// opnoname
  ^
ParseException: Expected W:(0-9A-Za-z), found ' '  (at char 2), (line:1, col:3)
FAIL: Expected W:(0-9A-Za-z), found ' '  (at char 2), (line:1, col:3)

I'd like a custom message, such as "name was needed" instead of the generic "Expected W:(0-9A0Za-z), found ' '".

So far it looks like catching the ParseException, modifying the message in it, and re-rasising it would be a solution. Am I missing something more fundamental?

For those curious: this came up when writing a JCL (Job Control Language) parser.

I'm using latest pyparsing stable version at the moment: 3.0.9.

Solution

They aren't custom error messages, but you can use set_name to give nice names to the different elements of your grammar.

identifier = Word(alphanums).set_name("identifier")
grammar = (
    Literal("//")
    - identifier("name")
    + White().suppress()
    + identifier("operation")
).leave_whitespace()
grammar.set_name("grammar")

Note the difference between set_name and set_results_name. set_name gives a name to the expression itself, while set_results_name (which you implicitly call when using the expr("name") notation) is what assigns names to the parsed results.

It is easier to see the distinction if you generate a railroad diagram, and add show_results_names=True:

grammar.create_diagram("grammar.html", show_results_names=True)

I've always wanted to let pyparsing deal with whitespace when I can, and not explicitly show it in my grammar. If you want to enforce no spaces between the leading '//' and the name identifier, you can write like this:

grammar = (
    Literal("//")
    - identifier("name").leave_whitespace()
    + identifier("operation")
)

Pyparsing's implicit whitespace skipping will take care of the spaces between name and operation. leave_whitespace on the name alone tells pyparsing not to skip whitespace before parsing the name identifier. But you may have other plans for further parts of this grammar, so I'll leave it up to you which way to go on this.

I'm glad to see you are using run_tests! Here is a tip: you can insert comments in your tests, and they will show up as labels for each test in your output, like this:

grammar.run_tests("""\
# successful expression
//name op
# more than one space between name and operation - still works!
//name   op
# failing expression, missing second identifier
// opnoname
# failing expression, name but no operation
//namenoopn
# failing expression, space after '//'
// name op
""")

and get this output:

# successful expression
//name op
['//', 'name', 'op']
- name: 'name'
- operation: 'op'

# more than one space between name and operation - still works!
//name   op
['//', 'name', 'op']
- name: 'name'
- operation: 'op'

# failing expression, missing second identifier
// opnoname
// opnoname
  ^
ParseSyntaxException: Expected identifier, found ' '  (at char 2), (line:1, col:3)
FAIL: Expected identifier, found ' '  (at char 2), (line:1, col:3)

# failing expression, name but no operation
//namenoopn
//namenoopn
           ^
ParseSyntaxException: Expected identifier, found end of text  (at char 11), (line:1, col:12)
FAIL: Expected identifier, found end of text  (at char 11), (line:1, col:12)

# failing expression, space after '//'
// name op
// name op
  ^
ParseSyntaxException: Expected identifier, found ' '  (at char 2), (line:1, col:3)
FAIL: Expected identifier, found ' '  (at char 2), (line:1, col:3)

You can also insert blank spaces between tests for readability, just like you would insert blank lines in your Python code.