Search code examples
pythonpyparsing

pyparsing: ignore any token that doesn't match


I have this file from a game that I'm trying to parse. Here is an excerpt:

    <stage> id: 50  #Survival Stage
            <phase> bound: 1500  # phase 0   bandit
                    music: bgm\stage4.wma
                    id: 122  x: 100  #milk  ratio: 1
                    id: 30 hp: 50  times: 1
                    id: 30 hp: 50  times: 1  ratio: 0.7
                    id: 30 hp: 50  times: 1  ratio: 0.3
            <phase_end>
    <stage_end>

The # denotes a comment, but only to human readers, not to the game's parser. The first two comments are to the end of the line, but the ratio: 1 after #milk is not part of the comment, it actually counts. I think the game's parser ignores any tokens it can't understand. Is there a way to do this in pyparsing?

I tried using parser.ignore(pp.Word(pp.printables)) but that makes it skip over everything. Here's my code so far:

import pyparsing as pp

txt = """
<stage> id: 50  #Survival Stage
        <phase> bound: 1500  # phase 0   bandit
                music: bgm\stage4.wma
                id: 122  x: 100  #milk  ratio: 1
                id: 30 hp: 50  times: 1
                id: 30 hp: 50  times: 1  ratio: 0.7
                id: 30 hp: 50  times: 1  ratio: 0.3
        <phase_end>
<stage_end>
"""

phase = pp.Literal('<phase>')
stage = pp.Literal('<stage>') + pp.Literal('id:') + pp.Word(pp.nums)('id') + pp.OneOrMore(phase)
parser = stage

parser.ignore(pp.Word(pp.printables))

print(parser.parseString(txt).dump())

Solution

  • It turns out in the stock game file only the ratio: keyword ever appears after a #, so I used that to define the end of a comment, like so:

    parser.ignore(Suppress('#') + SkipTo(MatchFirst([FollowedBy('ratio:'), LineEnd()])))