I am trying to parse an input file given the following format.
file = "Begin 'big section header'
#... section contents ...
sub 1: value
sub 2: value
....
Begin 'interior section header'
....
End 'interior section header'
End 'big section header'"
to return a list that greedily grabs everything between the labeled section header value
['section header', ['section contents']]
my current attempt looks like this
import pyparsing as pp
begin = pp.Keyword('Begin')
header = pp.Word(pp.alphanums+'_')
end = pp.Keyword('End')
content = begin.suppress() + header + pp.SkipTo(end + header)
content.searchString(file).asList()
returns
['section header', ['section contents terminated at the first end and generic header found']]
i suspect my grammar needs to be changed to some form of
begin = pp.Keyword('Begin')
header = pp.Word(pp.alphanums+'_')
placeholder = pp.Forward()
end = pp.Keyword('End')
placeholder << begin.suppress() + header
content = placeholder + pp.SkipTo(end + header)
but I cant for the life of me figure out the correct assignment to the Forward object that doesn't give me what I already have.
Even easier than Forward
in this case would be to use matchPreviousLiteral
:
content = begin.suppress() + header + pp.SkipTo(end + matchPreviousLiteral(header))
You are matching any end
, but what you want is the end
that matches the previous begin
.