Search code examples
pythonpyparsing

pyparsing list of lists and values


A mock version of the code that I am trying is the following. I have many more cases of what a SingleValue is, and other constructions, but this is the part I am failing to represent.

import pyparsing as pp

SingleValue = pp.Word(pp.alphas)  # Single values are strings of letters
ListOfValues = pp.Forward()  # To be defined later
Values = SingleValue ^ ('{' + ListOfValues + '}')  # Values are either words or braces-enclosed lists
ListOfValues <<= pp.delimited_list(Values, delim=' ')  # The lists inside the braces are space-separated values.

print(ListOfValues.parse_string('{aaa bbb {ccc ddd}}'))

The comments indicate what I am expecting to represent in each line. The intension is that the string in the example is a valid ListOfValues: A braces enclosed, space separated, of either words or further lists.

The sample code gives

pyparsing.exceptions.ParseException: Expected {W:(A-Za-z) ^ {'{' Forward: None '}'}}, found 'bbb'  (at char 5), (line:1, col:6)

I also tried

Values = SingleValue ^ ListOfValues
ListOfValues <<= '{' + pp.delimited_list(Values, delim=' ') + '}'

This gives

pyparsing.exceptions.ParseException: Expected '}', found 'bbb'  (at char 5), (line:1, col:6)

How to define this?


It looks like the following works in this case

SingleValue = pp.Word(pp.alphas)  # Single values are strings of letters
Values = pp.Forward()
ListOfValues = '{' + pp.delimited_list(Values[...], delim=' ') + '}'  # The lists inside the braces are space-separated values.
Values <<= SingleValue | ListOfValues  # Values are either words or braces-enclosed lists

print(ListOfValues.parse_string('{aaa bbb {ccc ddd}}'))

This gives

['{', 'aaa', 'bbb', '{', 'ccc', 'ddd', '}', '}']

Although I still don't understand the reasoning why the first version in the question is not correct.


Solution

  • The pyparsing module’s default behaviour is to ignore the leading whitespace. (see 1.1.2 Usage notes)

    That means Literal(' ') won't match and delimited_list will stop parsing

    For non-skipping whitespace, there is pp.White:

    ListOfValues <<= '{' + pp.delimited_list(Values, delim=pp.White(' ')) + '}'
    

    You could also use Values[...] instead, although it will accept any number of whitespaces as a delimiter:

    ListOfValues <<= '{' + Values[...] + '}'