Search code examples
pythonparsingpyparsing

With pyparsing how do you ignore nested structures?


I am having trouble finding a way to ignore a structure if it is nested in another type of structure. In the example below I have a structure_a that I am trying to parse for, but in my results I am also getting matches for structure_a that are nested in another structure. I don't want pyparsing to match those unless I match the outer structure first. How would I go about doing that?

self.LBRACE, self.RBRACE, self.LBRACK, self.RBRACK, self.SEMI, self.COMMA, self.DOUBLEQUOTE = map(pp.Suppress, '{}[];,"')

def parse(self, data):
    template = CaselessKeyword("structure_a")

    words = Word(alphanums + "_" + "." + "[" + "]")

    recursive_grammar = Forward()
    recursive_grammar <<= (
        Group(words("type") + words("name") + self.LBRACE +      
              ZeroOrMore(recursive_grammar) + self.RBRACE |

              words("name") + self.LBRACE + 
              ZeroOrMore(recursive_grammar) + self.RBRACE |

              self.LBRACE + ZeroOrMore(recursive_grammar) + self.RBRACE |

              self.LBRACE + ZeroOrMore(words("type")) + self.RBRACE) |

        Group(words("name") + self.EQUAL + recursive_grammar |

              ZeroOrMore(words("type")) + words("name") + self.EQUAL + 
              words("value") + Optional(self.COMMA) |

              words("name") + self.EQUAL + words("value") + 
              Optional(self.COMMA))
    )

    grammar = (template("category") + words("type") + words("name") +
               self.LBRACE + ZeroOrMore(recursive_grammar)("members") + 
               self.RBRACE + Optional(cStyleComment)("short_description"))

    result = grammar.searchString(data)

    return result
# I want to match this structure
structure_a type name { 
    variable = 1
}

structure_b name {
    # I only want to match a nested structure_a if I create a new 
    # grammar to match structure_b that have structure_a nested in it.    
    # Otherwise I want to ignore nested structure_a
    structure_a type name { 
        variable = 2
    }
}

Currently my grammar matches stuff that are in structure_b as well top level elements. I don't want pyparsing to match stuff in structure_b unless I explicitly match structure_b first.


Solution

  • After writing the question out and posting it and taking time away from the problem, I think I have come up with a solution. I think the reason it was matching the nested stucture_a was because it wasn't able to find match for the outer structure_b so the parser just moves to the next line of text and the parser doesn't know that the nested structure_a is nested. So I rewrote my code to this and it seems to work.

    self.LBRACE, self.RBRACE, self.LBRACK, self.RBRACK, self.SEMI, self.COMMA, self.DOUBLEQUOTE = map(pp.Suppress, '{}[];,"')
    
    def parse(self, data):
        template1 = CaselessKeyword("structure_a")
        template2 = CaselessKeyword("structure_b")
    
        words = Word(alphanums + "_" + "." + "[" + "]")
    
        recursive_grammar = Forward()
        recursive_grammar <<= (
            Group(words("type") + words("name") + self.LBRACE +  
                  ZeroOrMore(recursive_grammar) + self.RBRACE |
    
                  words("name") + self.LBRACE + ZeroOrMore(recursive_grammar) + 
                  self.RBRACE |
    
                  self.LBRACE + ZeroOrMore(recursive_grammar) + self.RBRACE |
    
                  self.LBRACE + ZeroOrMore(words("type")) + self.RBRACE | 
    
                  # added the nested structure to my recursive grammar
                  template1("category") + words("type") + words("name") +
                  self.LBRACE + ZeroOrMore(recursive_grammar)("members") + 
                  self.RBRACE + Optional(cStyleComment)("short_description")) |
    
            Group(words("name") + self.EQUAL + recursive_grammar |
    
                  ZeroOrMore(words("type")) + words("name") + self.EQUAL + 
                  words("value") + Optional(self.COMMA) |
    
                  words("name") + self.EQUAL + words("value") + Optional(self.COMMA))
        )
    
        grammar = (template1("category") + words("type") + words("name") +
                   self.LBRACE + ZeroOrMore(recursive_grammar)("members") + 
                   self.RBRACE + Optional(cStyleComment)("short_description") | 
    
                   # Match stucture_b
                   template2("category") + words("name") + self.LBRACE +
                   ZeroOrMore(recursive_grammar)("members") + self.RBRACE + 
                   Optional(cStyleComment)("short_description")
        )
    
        result = grammar.searchString(data)
    
        return result
    
    # Same example from question...
    
    structure_a type name { 
        variable = 1
    }
    
    structure_b name {
        structure_a type name { 
            variable = 2
        }
    }
    
    # Results...
    
    Name: name
    Category: structure_a
    Type: type
    
    ['variable', '=', '1']
    
    
    
    Name: name
    Category: structure_b
    Type: 
    
    ['structure_a', 'type', 'name', ['variable', '=', '2']]