Search code examples
pythonmatchpyparsing

pyparsing to group matched string and unmatched stings in the same order the input text


I have a problem in parsing my expression string. I want to identify all the identifiers from the input string using pyparsing.

identifier=pyparsing_common.identifier

My input string is

identifier.parseString('1+2*xyz*abc/5')

I want the below as output

[['1+2*'],['xyz'],['*'],['abc'],['/5']]

Can anyone please help me how to achieve this?

Thanks in advance


Solution

  • Here are a number of different code samples to show some alternative ways to tackle your problem (using pyparsing version 2.4.7).

    Using your definitions of input_string and identifier:

    >>> input_string = "1+2*xyz*abc/5"
    >>> identifier = pp.pyparsing_common.identifier
    

    Using identifier.split() (similar to re.split) to get the parts of the input string:

    >>> print(list(identifier.split(input_string, includeSeparators=True)))
    ['1+2*', 'xyz', '*', 'abc', '/5']
    

    Using identifier.searchString() to return a ParseResults for each match:

    >>> print(identifier.searchString(input_string))
    [['xyz'], ['abc']]
    

    Using the sum() built-in to combine the matches into a single ParseResults:

    >>> print(sum(identifier.searchString(input_string)))
    ['xyz', 'abc']
    

    Using the locatedExpr helper method to wrap identifier, so that each match produces a group containing the matched value, plus the start and end locations:

    >>> print(sum(pp.locatedExpr(identifier).searchString(input_string)))
    [[4, 'xyz', 7], [8, 'abc', 11]]
    

    Using dump() to show the values as a list, and the named results in each subgroup:

    >>> print(sum(pp.locatedExpr(identifier).searchString(input_string)).dump())
    [[4, 'xyz', 7], [8, 'abc', 11]]
    [0]:
      [4, 'xyz', 7]
      - locn_end: 7
      - locn_start: 4
      - value: 'xyz'
    [1]:
      [8, 'abc', 11]
      - locn_end: 11
      - locn_start: 8
      - value: 'abc'