Search code examples
javaxmlreader

Ignore XML doctype declarations in XMLReader (XXE)


I use the non-validating read for displaying or processing un-trusted XML documents where I do not need support for internal entities but I do want to be able to process then even if a DOCTYPE is shown.

With the disallow DOCTYPE-decl feature of SAX I can make sure parsing a XML document has no risk of external entities or billion laughter DOS expansions. This is also recommended by the OWASP XXE prevention cheat-sheet.

XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setFeature("http://apache.org/xml/features/continue-after-fatal-error", true);

reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

// or
reader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
reader.setFeature("http://xml.org/sax/features/external-general-entities", false);    
reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

However unfortunately this aborts the parsing when a DOCTYPE is given:

org.xml.sax.SAXParseException; systemId: file:... ; lineNumber: 2; columnNumber: 10;
    DOCTYPE is disallowed when the
    feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

And if I ignore this fatal error, then it will happily resolve internal entities, as you can see here: https://gist.github.com/ecki/f84d53a58c48b13425a270439d4ed84a

I wonder, is there a combination of features so I can read over but not evaluate the doctype declaration (especially avoiding recursive expansion).

I am looking to avoid defining my own Apache specific security-manager property or a special resolver.


Solution

  • According to core-lib-dev the XMLReaderFactory will be deprecated in Java 9 and the way to obtain a XMLReader will be to use a SAX Parser.

    In that case FSP can be used (which esablishes some resource limits as well as removes remote schema handlers for ACCESS_EXTERNAL_DTD and _SCHEMA):

    SAXParserFactory spf = SAXParserFactory.newInstance();
    spf.setXIncludeaware(false);
    // when FSP is activated explicit it will also restrict external entities
    spf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
    XMLReader reader = spf.newSAXParser().getXMLReader();