Search code examples
pythonparsingpyparsing

Splits based on pyparsing


so I want to do this (but using pyparsing)

Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]

My code so far is

package_header = Literal("Package:")
single_package =  Word(printables + " ") + ~Literal("Package:")
full_parser  = OneOrMore( pp.Group( package_header + single_package ) )

The current output is this

([(['Package:', 'numpy11 Package:scipy'], {})], {})

I was hoping for something like this

([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})

Essentially the rest of the text matches pp.printables

I am aware that I can use Words but I want to do

all printables but not the Literal

How do I accomplish this? Thank you.


Solution

  • You shouldn't need the negative lookahead, ie. this:

    from pyparsing import *
    
    package_header = Literal("Package:")
    single_package =  Word(printables)
    full_parser  = OneOrMore( Group( package_header + single_package ) )
    
    print full_parser.parseString("Package:numpy11 Package:scipy")
    

    prints:

    [['Package:', 'numpy11'], ['Package:', 'scipy']]
    

    Update: to parse packages delimited by | you can use the delimitedList() function (now you can also have spaces in package names):

    from pyparsing import *
    
    package_header = Literal("Package:")
    package_name = Regex(r'[^|]+')  # | is a printable, so create a regex that excludes it.
    package = Group(package_header + package_name) 
    full_parser = delimitedList(package, delim="|" )
    
    print full_parser.parseString("Package:numpy11 foo|Package:scipy")
    

    prints:

    [['Package:', 'numpy11 foo'], ['Package:', 'scipy']]