Search code examples
pythonpyparsing

How to get pyparsing to match "1 day" or "2 days" but fail "1 days" and "2 day"?


I'm trying to match a sentence fragment of the form "after 3 days" or "after 1 month". I want to be particular with the single and plural forms, so "1 day" is valid but "1 days" is not.

I have the following code which is nearly there but the first two entries in the failure tests don't fail. Any suggestions please that use syntactical notation as I'd like, if possible, to avoid a set_parse_action() that checks the numeric value against the unit's plurality.

from pyparsing import *

units = Keyword('days') ^ Keyword('months')
unit  = Keyword('day') ^ Keyword('month')

single = Literal('1') + unit
multi = Word(nums) + units

after = Keyword('after') + ( single ^ multi )

a = after.run_tests('''
    after 1 day
    after 2 days
    after 1 month
    after 2 months
    ''')

print('=============')

b = after.run_tests('''
    after 1 days
    after 2 day
    after 1day
    after 2days
    ''', failure_tests = True)

print('Success tests', 'passed' if a[0] else 'failed')
print('Failure tests', 'passed' if b[0] else 'failed')

Solution

  • Only the case after 1 days passes when it should fail, the other three cases fail as expected.

    The issue is that the check multi = Word(nums) + units uses nums which includes 1, so even if your singular variant does not work, this one will. I looked up how nums is defined, apparently it is nums = '0123456789' (see here). Consequently you remove the 1. This works for me:

    ...
    multi_nums = '023456789'  # nums excluding 1
    
    single = Literal('1') + unit
    multi = Word(multi_nums) + units
    ...
    

    EDIT:
    The above fails for double digits including 1, see comments. Fixed version as per comments:

    single = Literal('1') + unit
    multi = ~Keyword('1') + Word(nums) + units