I am having trouble finding a way to ignore a structure if it is nested in another type of structure. In the example below I have a structure_a that I am trying to parse for, but in my results I am also getting matches for structure_a that are nested in another structure. I don't want pyparsing to match those unless I match the outer structure first. How would I go about doing that?
self.LBRACE, self.RBRACE, self.LBRACK, self.RBRACK, self.SEMI, self.COMMA, self.DOUBLEQUOTE = map(pp.Suppress, '{}[];,"')
def parse(self, data):
template = CaselessKeyword("structure_a")
words = Word(alphanums + "_" + "." + "[" + "]")
recursive_grammar = Forward()
recursive_grammar <<= (
Group(words("type") + words("name") + self.LBRACE +
ZeroOrMore(recursive_grammar) + self.RBRACE |
words("name") + self.LBRACE +
ZeroOrMore(recursive_grammar) + self.RBRACE |
self.LBRACE + ZeroOrMore(recursive_grammar) + self.RBRACE |
self.LBRACE + ZeroOrMore(words("type")) + self.RBRACE) |
Group(words("name") + self.EQUAL + recursive_grammar |
ZeroOrMore(words("type")) + words("name") + self.EQUAL +
words("value") + Optional(self.COMMA) |
words("name") + self.EQUAL + words("value") +
Optional(self.COMMA))
)
grammar = (template("category") + words("type") + words("name") +
self.LBRACE + ZeroOrMore(recursive_grammar)("members") +
self.RBRACE + Optional(cStyleComment)("short_description"))
result = grammar.searchString(data)
return result
# I want to match this structure
structure_a type name {
variable = 1
}
structure_b name {
# I only want to match a nested structure_a if I create a new
# grammar to match structure_b that have structure_a nested in it.
# Otherwise I want to ignore nested structure_a
structure_a type name {
variable = 2
}
}
Currently my grammar matches stuff that are in structure_b as well top level elements. I don't want pyparsing to match stuff in structure_b unless I explicitly match structure_b first.
After writing the question out and posting it and taking time away from the problem, I think I have come up with a solution. I think the reason it was matching the nested stucture_a was because it wasn't able to find match for the outer structure_b so the parser just moves to the next line of text and the parser doesn't know that the nested structure_a is nested. So I rewrote my code to this and it seems to work.
self.LBRACE, self.RBRACE, self.LBRACK, self.RBRACK, self.SEMI, self.COMMA, self.DOUBLEQUOTE = map(pp.Suppress, '{}[];,"')
def parse(self, data):
template1 = CaselessKeyword("structure_a")
template2 = CaselessKeyword("structure_b")
words = Word(alphanums + "_" + "." + "[" + "]")
recursive_grammar = Forward()
recursive_grammar <<= (
Group(words("type") + words("name") + self.LBRACE +
ZeroOrMore(recursive_grammar) + self.RBRACE |
words("name") + self.LBRACE + ZeroOrMore(recursive_grammar) +
self.RBRACE |
self.LBRACE + ZeroOrMore(recursive_grammar) + self.RBRACE |
self.LBRACE + ZeroOrMore(words("type")) + self.RBRACE |
# added the nested structure to my recursive grammar
template1("category") + words("type") + words("name") +
self.LBRACE + ZeroOrMore(recursive_grammar)("members") +
self.RBRACE + Optional(cStyleComment)("short_description")) |
Group(words("name") + self.EQUAL + recursive_grammar |
ZeroOrMore(words("type")) + words("name") + self.EQUAL +
words("value") + Optional(self.COMMA) |
words("name") + self.EQUAL + words("value") + Optional(self.COMMA))
)
grammar = (template1("category") + words("type") + words("name") +
self.LBRACE + ZeroOrMore(recursive_grammar)("members") +
self.RBRACE + Optional(cStyleComment)("short_description") |
# Match stucture_b
template2("category") + words("name") + self.LBRACE +
ZeroOrMore(recursive_grammar)("members") + self.RBRACE +
Optional(cStyleComment)("short_description")
)
result = grammar.searchString(data)
return result
# Same example from question...
structure_a type name {
variable = 1
}
structure_b name {
structure_a type name {
variable = 2
}
}
# Results...
Name: name
Category: structure_a
Type: type
['variable', '=', '1']
Name: name
Category: structure_b
Type:
['structure_a', 'type', 'name', ['variable', '=', '2']]