Search code examples
pythonxmlsaxsaxparser

Can I somehow tell to SAX parser to stop at some element and get its child nodes as a string?


I have pretty big XML documents, so I don't want to use DOM, but while parsing a document with SAX parser I want to stop at some point (let's say when I reached element with a certain name) and get everything inside that element as a string. "Everything" inside is not necessary a text node, it may contain tags, but I don't want them to me parsed, I just want to get them as text.

I'm writing in Python. Is it possible to solve? Thanks!


Solution

  • It does not seem to be offered by the xml.sax API, but you can utilize another way of interrupting control flow: exceptions.

    Just define a custom exception for that purpose:

    class FinishedParsing(Exception):
        pass
    

    Raise this exception in your handler when you have finished parsing and simply ignore it.

    try:
        parser.parse(xml)
    except FinishedParsing:
        pass