Search code examples
pythonpyparsing

Why does pyparsing's optional always return a list


I'm trying to get the LIMIT from a SQL statement

query = "LIMIT 1"

LIMIT = "LIMIT"

int_num = pyparsing_common.signed_integer()

limit_clause = Optional(Group(LIMIT + int_num), None)
statement = limit_clause("limit")


if __name__ == "__main__":
    result = statement.parseString(query)
    print(result["limit"])

prints [['LIMIT', 1]]

This is of course a contrived example, but why does it return as [['LIMIT', 1]] instead of just 1? Is there a way to get it to just return a 1?


Solution

  • According to the documentation of pyparsing:

    • the operator + is an Expression operator that "creates And using the expressions before and after the operator",
    • the class And is an Expression subclass that "construct with a list of ParserElements, all of which must match for And to match",
    • the class Group is a special subclass that "causes the matched tokens to be enclosed in a list",
    • the class Optional is an Expression subclass that "construct with a ParserElement, but this element is not required to match; can be constructed with an optional default argument, ...".

    So roughly the + operator creates a list of the results 'LIMIT' and pyparsing.pyparsing_common.signed_integer(), and then the class Group creates a list containing this list. This explains why both 'LIMIT' and 1 appear in the result, and also why they are inside nested lists.

    The reality is a little more complex, because the returned objects are not lists, but instances of the class pyparsing.ParseResults. Running the following code:

    import pyparsing
    
    # construct parser
    LIMIT = 'LIMIT'
    int_num = pyparsing.pyparsing_common.signed_integer()
    limit_clause = pyparsing.Optional(pyparsing.Group(LIMIT + int_num), None)
    statement = limit_clause('limit')
    # parse a string
    query = 'LIMIT 1'
    result = statement.parseString(query)
    print(repr(result))
    

    prints:

    ([(['LIMIT', 1], {})], {'limit': [([(['LIMIT', 1], {})], {})]})
    

    then the statement print(repr(result['limit'])) prints:

    ([(['LIMIT', 1], {})], {})
    

    and the statement print(str(result['limit'])) prints:

    [['LIMIT', 1]]
    

    For posterity, this answer uses pyparsing == 2.4.7 (the current development version of pyparsing (GitHub repository) has been significantly restructured from a single module to a package, notably in commit 0b398062710dc00b952636bcf7b7933f74f125da).

    A few version-related comments about the class ParseResults, which is used to represent each parser's result: