Search code examples
python-2.7pyparsing

python, pyparsing, stopOn and repeating structures


The time has come to brush up on my pyparsing skills.

given a file containing repetitive structures

space_missions
Main Objects:
  /Projects/antares_III
  /Projects/apollo
ground_missions
Main Objects:
  /Projects/Barbarossa
  /Projects/Desert_Eagle

and my chopped-down 2.7 script

def last_occurance_of( expr):
  return expr + ~pp.FollowedBy( expr)

ppKeyName = pp.Word( pp.alphanums)
ppObjectLabel = pp.Literal("Main Objects") + pp.FollowedBy(':')
ppObjectRegex = pp.Regex(r'\/Projects\/\w+')
ppTag = pp.Group( ppKeyName.setResultName('keyy') + pp.Suppress( ppObjectLabel) + pp.ZeroOrMore( ppObjectRegex, stopOn=last_occurance_of( ppObjectRegex)).setResultName('objects') )
ppTags = pp.OneOrMore( ppTag)
with open( fn) as fp:
  slurp = fp.read()
results = ppTags.parseString( slurp)

I'd like to get results to return

[['space_missions',['/Projects/antares_III','/Projects/apollo']
,['ground_missions',['/Projects/Barbarossa','/Projects/Desert_Eagle']]

So what am I missing here? I realize I'm lucky in that the strings that make up the lists all have the same beginning which gives last_occurance_of() something to lock on to, but what does one do in the more general case where the strings have nothing to differentiate them from tag-strings

Still-Searching Steve


Solution

  • Three things to fix in your parser:

    1. Your given ppKeyNames include '_'s, but you don't include them in the definition of ppKeyName

    2. ppObjectLabel will parse "Main Objects" followed by a ':', but the ':' does not actually get parsed anywhere. Easiest to just add it to ppObjectLabel instead of using pp.FollowedBy.

    3. last_occurance_of is unnecessary, the repetition of ppObjectRegex will not be confused by the next tag's ppKeyName