Search code examples
pythonpyparsing

pyparsing matching any combination of specified Literals


Example: I have the literals "alpha", "beta", "gamma". How do I make pyparsing parse the following inputs:

alpha
alpha|beta
beta|alpha|gamma

The given input can be constructed by using one or more non-repeating literals from a given set, separated by "|". Advice on setting up pyparsing will be appreciated.


Solution

  • Use the '&' operator for Each, instead of '+ or '|'. If you must have all, but in unpredicatable order use:

    Literal('alpha') & 'beta' & 'gamma'
    

    If some may be missing, but each used at most once, then use Optionals:

    Optional('alpha') & Optional('beta') & Optional('gamma')
    

    Oops, I forgot the '|' delimiters. One lenient parser would be to use a delimitedList:

    delimitedList(oneOf("alpha beta gamma"), '|')
    

    This would allow any or all of your choices, but does not guard against duplicates. May be simplest to use a parse action:

    itemlist = delimitedList(oneOf("alpha beta gamma"), '|')
    def ensureNoDuplicates(tokens):
        if len(set(tokens)) != len(tokens):
            raise ParseException("duplicate list entries found")
    itemlist.setParseAction(ensureNoDuplicates)
    

    This feels like the simplest approach to me.

    EDIT:

    Recent versions of pyparsing have introduced parse-time conditions to make this kind of parse action easier to write:

    itemlist = delimitedList(oneOf("alpha beta gamma"), '|')
    itemlist.addCondition(lambda tokens: len(set(tokens)) == len(tokens),
                          "duplicate list entries found")