Search code examples
pythonpyparsing

python pyparsing word excludeChars


I am trying to make a parser for a number which can contain an '_'. I would like the underscore to be suppressed in the output. For example, a valid word would be 1000_000 which should return a number: 1000000. I have tried the excludeChars keyword argument for this as my understanding is that this should do the following:

"If supplied, this argument specifies characters not to be considered to match, even if those characters are otherwise considered to match."

Taken from http://infohost.nmt.edu/tcc/help/pubs/pyparsing/pyparsing.pdf - page 33 section 5.35 (great pyparsing reference btw)

So below is my attempt:

import pyparsing as pp
num = pp.Word(pp.nums+'_', excludeChars='_')
num.parseString('123_4')

but I end up with the result '123' instead of '1234'

In [113]: num.parseString('123_4')
Out[113]: (['123'], {})

Any suggestions?


Solution

  • You are misinterpreting the purpose of excludeChars. It is not there to suppress those characters from the output, it is there as an override to characters given in the initial and body character strings. So this

    Word(nums+'_', excludeChars='_')
    

    is just the same as

    Word(nums)
    

    excludeChars was added because there were many times that users wanted to define words like:

    • all printables except for ':'
    • all printables except for ',' or '.'
    • all printables except for ...

    Before excludeChars was added, the only way to do this was the clunky-looking:

    Word(''.join(c for c in printables if c != ':'))
    

    or

    Word(printables.replace(',',''))
    

    Instead you can now write

    Word(printables, excludeChars=',.')
    

    In your case, you want to parse the numeric value, allowing embedded '_'s, but return just the numerics. This would be a good case for a parse action:

    integer = Word(nums+'_').setParseAction(lambda t: t[0].replace('_',''))
    

    Parse actions are called at parse time to do filtering and conversions. You can even include the conversion to int as part of your parse action:

    integer = Word(nums+'_').setParseAction(lambda t: int(t[0].replace('_','')))
    integer.parseString('1_000')  -->  [1000]