Search code examples
pythonpyparsing

Pyparsing failed to parse multiple rules


I am trying to create Boolean query parser with some special rules like adjacent and near values. Rules i have created so far is

## DEFINITIONS OF SYMBOLS ###
NEAR = CaselessLiteral('near').suppress()
NUMBER = Word(nums)
NONEDIRECTIONAL = Combine(NEAR+NUMBER)
ADJ = CaselessLiteral("ADJ").setParseAction(replaceWith('0'))
OAND = CaselessLiteral("and")
OOR = CaselessLiteral("or")
ONOT = CaselessLiteral("not")

## ----------------------- ##
## DEFINITIONS OF TERMS ###
# Do not break quoted string.
QUOTED = quotedString.setParseAction(removeQuotes)

# space-separated words are easiest to define using just OneOrMore
# must use a negative lookahead for and/not/or operators, and this must come
# at the beginning of the expression
WORDWITHSPACE = OneOrMore(~(OAND | ONOT | OOR | NONEDIRECTIONAL | ADJ) +
                          Word(printables, excludeChars="()"))

# use a parse action to recombine words into a single string
WORDWITHSPACE.addParseAction(lambda t: ' '.join(t))

TERM = (QUOTED | WORDWITHSPACE)
## ----------------------- ##
## DEFINITIONS OF Expresion ###

EXPRESSION = infixNotation(TERM,
                           [
                               (ADJ, 2, opAssoc.LEFT),
                               (NONEDIRECTIONAL, 2, opAssoc.LEFT),
                               (ONOT, 1, opAssoc.RIGHT),
                               (Optional(OAND, default='and'), 2, opAssoc.LEFT),
                               (OOR, 2, opAssoc.LEFT)
                           ])
# As we can have more than one occurances of symbols together we are
# using `OneOrMore` Exprestions

BOOLQUERY = OneOrMore(EXPRESSION) + StringEnd()
## ----------------------- ##

When i run

((a or b) and (b and c)) or (a and d)

It works fine

Whereas when i try to parse

((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))

It code stuck not able to process.

can any one help me out where i am going wrong ?

Updated code :

EXPRESSION = infixNotation(TERM,
                           [
                               (ONOT, 1, opAssoc.RIGHT),
                               (Optional(OAND, default='and'), 2, opAssoc.LEFT),
                               ((OOR | NONEDIRECTIONAL | ADJ), 2, opAssoc.LEFT)
                           ])

kept optional and because of cases like

x not y not z


Solution

  • Your program is taking a long time because your infixNotation is 5 layers deep AND has an optional AND operator.

    I was able to run this as-is by just enabling packrat parsing. Do this by adding to the top of your script (right after importing pyparsing):

    ParserElement.enablePackrat()
    

    To run your tests, I used runTests. It was not clear to me why BOOLQUERY was necessary, since you are just parsing expressions:

    tests = """\
    ((a or b) and (b and c)) or (a and d)
    ((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))
    """
    EXPRESSION.runTests(tests)
    

    Gives:

    ((a or b) and (b and c)) or (a and d)
    [[[['a', 'or', 'b'], 'and', ['b', 'and', 'c']], 'or', ['a', 'and', 'd']]]
    [0]:
      [[['a', 'or', 'b'], 'and', ['b', 'and', 'c']], 'or', ['a', 'and', 'd']]
      [0]:
        [['a', 'or', 'b'], 'and', ['b', 'and', 'c']]
        [0]:
          ['a', 'or', 'b']
        [1]:
          and
        [2]:
          ['b', 'and', 'c']
      [1]:
        or
      [2]:
        ['a', 'and', 'd']
    
    
    ((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))
    [[[[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']], 'or', [['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]]]
    [0]:
      [[[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']], 'or', [['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]]
      [0]:
        [[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']]
        [0]:
          [['smart', '0', 'contract*'], 'and', 'agreement']
          [0]:
            ['smart', '0', 'contract*']
          [1]:
            and
          [2]:
            agreement
        [1]:
          or
        [2]:
          ['enforced', '3', 'without', '3', 'interaction']
        [3]:
          or
        [4]:
          ['automated', '0', 'escrow']
      [1]:
        or
      [2]:
        [['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]
        [0]:
          ['protocol*', 'or', ['Consensus', '0', 'algorithm']]
          [0]:
            protocol*
          [1]:
            or
          [2]:
            ['Consensus', '0', 'algorithm']
        [1]:
          5
        [2]:
          ['agreement', 'and', 'transaction']