Search code examples
pythonpyparsing

python pyparsing "^" vs "|" keywords


I have a small testcase created to illustrate a problem I am seeing with the "^" operator. When I try and use the ^ operator instead of the | operator below, I get an error.

Edit: Just to make the question clearer (although it has already been answered) for anyone else reading it. The question is why can I not use the "^" operator in place of the "|" operator in the following program.

The testcase:

import unittest
import pyparsing as pp

def _get_verilog_num_parse():
    """Get a parser that can read a verilog number
    return: Parser for verilog numbers
    rtype: PyParsing parser object
    """
    apos           = pp.Suppress(pp.Literal("'"))
    radix          = pp.Word('bdhBDH', exact=1).setResultsName('radix')
    dec_num        = pp.Word(pp.nums+'_'   ).setParseAction(lambda x:int(x[0].replace('_', ''),10))
    hex_num        = pp.Word(pp.hexnums+'_').setParseAction(lambda x:int(x[0].replace('_', ''),16))
    bin_num        = pp.Word('01'+'_'      ).setParseAction(lambda x:int(x[0].replace('_', ''),2))
    size           = pp.Optional(dec_num).setResultsName('size')
    valid_nums     = {'b':bin_num,'d':dec_num,'h':hex_num}
    verilog_lits   = pp.Forward()

    def size_mask(parser):
        size = parser.get('size')
        if size is not None:
            print("In size_mask. size: {} parser[value]: {}".format(size, parser['value']))
            return parser['value'] & ((1<<size) -1)
        else:
            print("In size_mask. no size. parser[value]: {}".format(parser['value']))
            return parser['value']

    def radix_parse_action(toks):
        verilog_lits << (valid_nums[toks.get('radix').lower()])
    radix.addParseAction(radix_parse_action)
    #return size, apos + radix + verilog_lits
    return (size + apos + radix + verilog_lits.setResultsName('value')).addParseAction(size_mask)

class CheckPyParsing(unittest.TestCase):
    '''Check that the Expression Parser works with the expressions
    defined in this test'''

    def test_or(self):
        """Check basic expressions not involving referenced parameters"""
        expressions_to_test = [
                ("8'd255",255),
                ("'d255",255),
                ]
        parser = _get_verilog_num_parse() | pp.Literal("Some_demo_literal")
        for expr,expected in expressions_to_test:
            result = parser.parseString(expr)
            print("result: {}, val: {}".format(result, result[0]))
            self.assertEqual(expected,result[0], "test_string: {} expected: {} result: {}".format(expr, expected, result[0]))

When I use the | I get this:

test_or (yoda_interface.tests.CheckPyParsing_test.CheckPyParsing)
Check basic expressions not involving referenced parameters ... In size_mask. size: 8 parser[value]: 255
result: [255], val: 255
In size_mask. no size. parser[value]: 255
result: [255], val: 255
ok

When I use the ^ I get:

test_or (yoda_interface.tests.CheckPyParsing_test.CheckPyParsing)
Check basic expressions not involving referenced parameters ... ERROR

======================================================================
ERROR: test_or (yoda_interface.tests.CheckPyParsing_test.CheckPyParsing)
Check basic expressions not involving referenced parameters
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\projects\check_private\yoda_interface\tests\CheckPyParsing_test.py", line 45, in test_or
    result = parser.parseString(expr)
  File "C:\Users\gkuhn\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyparsing.py", line 1125, in parseString
    raise exc
  File "C:\Users\gkuhn\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyparsing.py", line 1115, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "C:\Users\gkuhn\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyparsing.py", line 989, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\gkuhn\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyparsing.py", line 2440, in parseImpl
    raise maxException
pyparsing.ParseException:  (at char 3), (line:1, col:4)

----------------------------------------------------------------------
Ran 1 test in 0.012s

FAILED (errors=1)

Solution

  • This is a difficult case, and it requires a little understanding of some of pyparsing's internals to refactor your parser into a working version.

    Your "perfect storm" of problems combines these factors:

    • a dynamic parser element (verilog_lits)
    • a parse action that dynamically defines the contents of a related parser element
    • attaching that parse action to an element in an Or expression

    MatchFirst (created using the '|' operator), can just crunch through its list of alternatives trying to parse each in turn, and returning when one succeeds. In doing so, if there is a parse action, then after the parse succeeds, the parse action gets run, as expected. In your case, this parse action injects the correct numeric expression for the value part of the binary, hex, or decimal value.

    But Or can't follow this same strategy. When writing pyparsing, I could not predict if any given parse action has side effects or other implications besides working just with the parsed tokens. So when Or goes through its alternatives, looking for the longest match, it has to do so without calling parse actions. If there is a parse action that updates a dynamic element in the parser, that action won't be called until the successful alternative is selected. Since you rely on the parse action to complete the parser, this will fail if the trigger that defines the dynamic expression is part of an Or.

    Based on this, I refactored your definition of "a radix followed by its type-specific allowed value" by replacing:

    return (size + apos + radix + verilog_lits.setResultsName('value')).addParseAction(size_mask)
    

    with

    radix_int = pp.ungroup(pp.CaselessLiteral('d').suppress() + dec_num |
                           pp.CaselessLiteral('h').suppress() + hex_num |
                           pp.CaselessLiteral('b').suppress() + bin_num)
    return (size + apos + radix_int('value')).addParseAction(size_mask)
    

    This may not have the pizzazz of a dynamic subexpression, but by expanding the dynamic expression into a set of 3 specific alternatives, this expression is now safe to be included into a "evaluate all and choose the longest" Or expression.