Search code examples
pythonxmlsaxentityresolver

Python SAX Parser: resolveEntity


I am having a hard time figuring out how to bind a ResolveEntityHandler of my own to a SAX parser. On SO there this answer. But unfortunately I cannot reproduce the result there.

When I run the following code, which is actually copied from the aforementioned answer, just updated to Python 3,

import io
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value.
    def resolveEntity(self, publicID, systemID):
        print ("TestHandler.resolveEntity(): %s %s" % (publicID, systemID))
        return systemID

    def skippedEntity(self, name):
        print ("TestHandler.skippedEntity(): %s" % (name))

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print ("TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID))

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print ('TestHandler.startElement():', summary)

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = io.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException as e:
        print ("ERROR %s" % e)

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: &num;'>Entity: &not;</test>
"""

main(XML)

and the external test.dtd

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

What I got is

TestHandler.startElement(): step: 
TestHandler.skippedEntity(): not

Process finished with exit code 0

So my questions are:

  1. why was resolveEntity never called?
  2. how to bind a ResolveEntityHandler to your parser?

Solution

  • What you are seeing has to do with a change in Python 3.7.1:

    Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. The feature can be enabled again with method setFeature() on the parser object and argument feature_external_ges.

    To get the same behaviour as in earlier versions, add these lines:

    from xml.sax.handler import feature_external_ges
    

    and (in the main function)

    parser.setFeature(feature_external_ges, True)