so I want to do this (but using pyparsing)
Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]
My code so far is
package_header = Literal("Package:")
single_package = Word(printables + " ") + ~Literal("Package:")
full_parser = OneOrMore( pp.Group( package_header + single_package ) )
The current output is this
([(['Package:', 'numpy11 Package:scipy'], {})], {})
I was hoping for something like this
([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})
Essentially the rest of the text matches pp.printables
I am aware that I can use Words but I want to do
all printables but not the Literal
How do I accomplish this? Thank you.
You shouldn't need the negative lookahead, ie. this:
from pyparsing import *
package_header = Literal("Package:")
single_package = Word(printables)
full_parser = OneOrMore( Group( package_header + single_package ) )
print full_parser.parseString("Package:numpy11 Package:scipy")
[['Package:', 'numpy11'], ['Package:', 'scipy']]
Update: to parse packages delimited by |
you can use the delimitedList()
function (now you can also have spaces in package names):
from pyparsing import *
package_header = Literal("Package:")
package_name = Regex(r'[^|]+') # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name)
full_parser = delimitedList(package, delim="|" )
print full_parser.parseString("Package:numpy11 foo|Package:scipy")
[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]