Search code examples
pyparsing

How to parse and return a hierarchy of keywords?


The keywords for a command I want to parse have a hierarchy, for example:

the keywords 'aaa' and 'bbb' would belong to 'product1' and 'ccc' and 'ddd' would belong to 'product2'. Overall 'product1' and 'product2' belong to 'product'.

When the user inputs a string such as 'ccc run X' I want the parser to output as part of the dump:

Product: product2 

I tried to think based on Parse and group multiple items together using Pyparse how to construct this heirarchy but can't seem to think of solution... can someone please point to a relevant example of pyparse elements suitable for this?

Thanks


Solution

  • I think a parse action is the best place to add items like this to the tokens. In the body of a parse action that gets the tokens passed in, you can add new named results just by using the dict interface on the tokens to add it.

    I mocked up this simple parser to parse your command:

    import pyparsing as pp
    
    cmd_prefix = pp.oneOf("aaa bbb ccc ddd")
    action_expr = pp.oneOf("run hold cancel submit pause resume")
    cmd_expr = (cmd_prefix("prefix") 
                + action_expr("action") 
                + pp.empty() + pp.restOfLine("qualifiers"))
    

    Running your sample command as a test:

    cmd_expr.runTests("""\
        aaa run X
        """)
    

    Gives:

    aaa run X
    ['aaa', 'run', 'X']
    - action: 'run'
    - prefix: 'aaa'
    - qualifiers: 'X'
    

    We can add a parse action to your cmd_expr to embellish the results with additional entries. To keep the code and data separate, here is a dict that defines a few added items based on the prefix:

    prefix_items = {
        'aaa': {'Product': 'product1', 'Material':  'paper', },
        'bbb': {'Product': 'product1', 'Material':  'wool', },
        'ccc': {'Product': 'product2', 'Material':  'wood', },
        'ddd': {'Product': 'product2', 'Material':  'plastic', },
        }
    

    And this parse action will add them to the parsed results:

    def add_prefix_items(tokens):
        # find dict of items to add
        adders = prefix_items.get(tokens.prefix, {})
    
        # for each key-value in dict, add to the parsed tokens
        for name, value in adders.items():
            tokens[name] = value
    
    cmd_expr.addParseAction(add_prefix_items)
    

    Here are some more tests and output:

    cmd_expr.runTests("""\
        aaa run X
        ddd hold Z
        eee resume A
        """)
    

    Gives:

    aaa run X
    ['aaa', 'run', 'X']
    - Material: 'paper'
    - Product: 'product1'
    - action: 'run'
    - prefix: 'aaa'
    - qualifiers: 'X'
    
    
    ddd hold Z
    ['ddd', 'hold', 'Z']
    - Material: 'plastic'
    - Product: 'product2'
    - action: 'hold'
    - prefix: 'ddd'
    - qualifiers: 'Z'
    
    eee resume A
    ^
    FAIL: Expected aaa | bbb | ccc | ddd (at char 0), (line:1, col:1)
    

    If this list gets long, you might end up having to read it from a database of some kind. Here is a little in-memory database example (using another open source lib of mine, littletable):

    import littletable as lt
    
    # create simple in-memory database table, indexed by item
    prefix_items = lt.Table().create_index('prefix').csv_import("""\
    prefix,name,value
    aaa,Product,product1
    aaa,Material,paper
    bbb,Product,product1
    bbb,Material,wool
    ccc,Product,product2
    ccc,Material,wood
    ddd,Product,product2
    ddd,Material,plastic
    """)
    
    def add_prefix_items_from_table(t):
        # get all entries in the table with matching key
        # (in a SQL database, this would be some kind of SELECT query)
        adders = prefix_items.by.prefix[t.prefix]
    
        # for each matching record, add the item-value to the parsed tokens
        for rec in adders:
            t[rec.name] = rec.value
    
    # clear previous parse action and add new one
    cmd_expr.setParseAction()
    cmd_expr.addParseAction(add_prefix_items_from_table)
    

    Gives same results as shown previously.