I have this file from a game that I'm trying to parse. Here is an excerpt:
<stage> id: 50 #Survival Stage
<phase> bound: 1500 # phase 0 bandit
music: bgm\stage4.wma
id: 122 x: 100 #milk ratio: 1
id: 30 hp: 50 times: 1
id: 30 hp: 50 times: 1 ratio: 0.7
id: 30 hp: 50 times: 1 ratio: 0.3
<phase_end>
<stage_end>
The #
denotes a comment, but only to human readers, not to the game's parser. The first two comments are to the end of the line, but the ratio: 1
after #milk
is not part of the comment, it actually counts. I think the game's parser ignores any tokens it can't understand. Is there a way to do this in pyparsing?
I tried using parser.ignore(pp.Word(pp.printables))
but that makes it skip over everything. Here's my code so far:
import pyparsing as pp
txt = """
<stage> id: 50 #Survival Stage
<phase> bound: 1500 # phase 0 bandit
music: bgm\stage4.wma
id: 122 x: 100 #milk ratio: 1
id: 30 hp: 50 times: 1
id: 30 hp: 50 times: 1 ratio: 0.7
id: 30 hp: 50 times: 1 ratio: 0.3
<phase_end>
<stage_end>
"""
phase = pp.Literal('<phase>')
stage = pp.Literal('<stage>') + pp.Literal('id:') + pp.Word(pp.nums)('id') + pp.OneOrMore(phase)
parser = stage
parser.ignore(pp.Word(pp.printables))
print(parser.parseString(txt).dump())
It turns out in the stock game file only the ratio:
keyword ever appears after a #
, so I used that to define the end of a comment, like so:
parser.ignore(Suppress('#') + SkipTo(MatchFirst([FollowedBy('ratio:'), LineEnd()])))