Search code examples
pyparsing

How to treat prefixed keywords?


I have following parsing problem. My keywords can be prefixed with an underscore for deactivation of the option, block etc.

#coding: utf8
from pyparsing import Keyword, Combine, pyparsing_common, Literal, Suppress, Group, OneOrMore

test_string = r'''
keyword1list {
    keyword1 {
        option 213
    }

    _keyword1 {
        option 214
    }
}
'''

This can happen to any keyword, here keyword1list, keyword1 or option. What I like to achieve is to either leave those blocks out during parsing or parse them but catch the deactivation prefix.

Currently, I can successfully parse the "activated" test_string with the following code, but it fails for apparent reasons with the underscored keyword.

lparent = Suppress(Literal('{'))
rparent = Suppress(Literal('}'))

kw1_block = Keyword('keyword1') + lparent
kw1_block = kw1_block + Keyword('option') + pyparsing_common.number.setResultsName('option')
kw1_block = Group(kw1_block + rparent).setResultsName('keyw1')

kw2_block = Keyword('keyword1list') + lparent
kw2_block = kw2_block+ OneOrMore(kw1_block) + rparent
kw2_block = Group(kw2_block).setResultsName('keyword1list', listAllMatches=True)
result = kw2_block.parseString(test_string)
print(result.dump())
tmp = kw2_block.runTests(test_string.replace('\n', '\\n'))
print tmp[0]

My current solution is, to put all keywords in a list and set up a dictionary to combine them all with the underscore and give them a flag.

#coding: utf8
from pyparsing import Keyword, Combine, pyparsing_common, Literal, Suppress, Group, OneOrMore, ZeroOrMore

test_string = r'''
keyword1list {

    _keyword1 {
        option 1
    }
    keyword1 {
        option 2
    }
    _keyword1 {
        option 3
    }
    keyword1 {
        option 4
    }
    keyword1 {
        option 5
    }
}
'''


kwlist = ['keyword1', 'keyword1list', 'option']
keywords = {}
for k in kwlist:
    keywords[k] = Keyword('_' + k).setResultsName('deactivated') | Keyword(
        k).setResultsName('activated')

lparent = Suppress(Literal('{'))
rparent = Suppress(Literal('}'))

kw1_block = keywords['keyword1'] + lparent
kw1_block = kw1_block + keywords[
    'option'] + pyparsing_common.number.setResultsName('option') + rparent
kw1_block = Group(kw1_block).setResultsName('keyword1', listAllMatches=True)

kw2_block = keywords['keyword1list'] + lparent
kw2_block = kw2_block + ZeroOrMore(kw1_block) + rparent
kw2_block = Group(kw2_block).setResultsName('keyword1list')
result = kw2_block.parseString(test_string)
print(result.dump())
tmp = kw2_block.runTests(test_string.replace('\n', '\\n'))
print tmp[0]

While this allows to parse everything properly I have to recreate the logic afterwards (finding the deactivated keywords and drop them from the result), which I like to avoid. I believe I need a parseAction on the underscored keywords to drop those tokens somehow but I currently cannot figure out how to do this.

Any help is greatly appreciated.


Solution

  • When I see a parser that is intended for filtering out selected blocks of text, my first approach is usually to write a parser that will match just the selected part, and then use transformString with a suppressed form of that parser:

    kwlist = ['keyword1', 'keyword1list', 'option']
    to_suppress = MatchFirst(Keyword('_' + k) for k in kwlist)
    kw_body = nestedExpr("{", "}") | Word(nums)
    
    filter = (to_suppress + kw_body).suppress()
    print(filter.transformString(test_string))
    

    Running this with your test string gives:

    keyword1list {
    
    
        keyword1 {
            option 2
        }
    
        keyword1 {
            option 4
        }
        keyword1 {
            option 5
        }
    }