Search code examples
javaxmldtdstax

Providing DTD file to StAX parser


I am using StAX to process a XML file. The document has a DOCTYPE reference to a dtd file

<!DOCTYPE onlineDoc SYSTEM "onlineDoc.dtd">

I get the XML from the internet (am streaming it), and the DTD file lies just next to the XML (but - like the xml - on the remote machine).

Now the DTD contains some entity declarations, that are used in the XML i.e.

<!ENTITY Ntilde "&#209;" ><!-- capital N, tilde -->

I dont provide the DTD yet, so the StAX parser throws an exception saying that the entity Ntilde cannot be resolved.

Q: how do I provide the DTD file to the parser (it would be best, if it could be a stream from teh internet).


Solution

  • With Woodstox everything's fine. Here's my snippet (using ClasspathResource class from Spring):

    XMLInputFactory xif = XMLInputFactory.newFactory();
    xif.setXMLResolver(new XMLResolver() {
      @Override
      public Object resolveEntity(String publicID, String systemID, String baseURI, String namespace) throws XMLStreamException {
        try {
          if ("onlineDoc.dtd".equals(systemID))
            return new ClassPathResource(systemID, getClass()).getInputStream();
        }
        catch (IOException e) {
          return null;
        }
        return null;
      }
    });
    XMLStreamReader reader = xif.createXMLStreamReader(new ClassPathResource("a.xml", this.getClass()).getInputStream());
    while (reader.hasNext()) {
      reader.next();
      if (reader.isCharacters())
        log.info(new String(reader.getTextCharacters()));
    }
    

    it works and for:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <!DOCTYPE onlineDoc SYSTEM "onlineDoc.dtd">
    <onlineDoc>
        <test>a &Ntilde; b</test>
    </onlineDoc>
    

    prints:

    a Ñ b
    

    in Maven use:

    <dependency>
       <groupId>org.codehaus.woodstox</groupId>
       <artifactId>woodstox-core-asl</artifactId>
       <version>4.1.2</version>
    </dependency>