Search code examples
javaxmldtdxmlstreamreader

How can we parse the DOCTYPE information using XMLEventReader?


I have some existing code which parses the top-level element namespace to determine what kind of XML file we're looking at.

XMLEventReader reader = createXMLEventReader(...);
try {
    while (reader.hasNext()) {
        XMLEvent event = reader.nextEvent();
        switch (event.getEventType()) {
            case XMLStreamConstants.DTD:
                // No particularly useful information here?
                //((DTD) event).getDocumentTypeDeclaraion();
                break;

            case XMLStreamConstants.START_ELEMENT:
                formatInfo.qName = ((StartElement) event).getName();
                return formatInfo;

            default:
                break;
        }
    }
} finally {
    reader.close();
}

If I allow the parser to load DTDs from the web, getDocumentTypeDeclaraion() contains a gigantic string with way more information than I know how to deal with, as it inserts all related DTDs into the string before handing it over. On the other hand, if I block the parser loading DTDs from the web (which is preferable anyway, for obvious reasons), it only gives me the string, "<!DOCTYPE".

Is there no way to get back the values inside the DOCTYPE?

I'm using the default parser which ships with the JRE, in case that matters.


Solution

  • I know it's an old post but I couldn't find an answer on the Web until I've found your question which pointed me in the right direction.

    Here the external unparsed entities for a DTD are retrieved by switching on the value given by the XMLEvent#getEventType() method.

    XMLInputFactory factory = XMLInputFactory.newInstance();
    factory.setXMLResolver(new XMLResolver() {
        @Override
        public Object resolveEntity(String publicID, String systemID,
                String baseURI, String namespace) throws XMLStreamException {
            //return a closed input stream if external entities are not needed
            return new InputStream() {
                @Override
                public int read() throws IOException {
                    return -1;
                }
            };
        }
    });
    
    XMLEventReader reader = factory.createXMLEventReader( . . . );
    try {
        while(reader.hasNext()) {
            XMLEvent event = reader.nextEvent();
            switch (event.getEventType()) {
                case XMLStreamConstants.DTD:
                    List<EntityDeclaration> entities = ((DTD)event).getEntities();
                    if (entities != null) {
                        for (EntityDeclaration entity : entities)
                            System.out.println(entity.getName() + " = " + entity.getSystemId());
                    }
                    break;
                case . . .
            }
        }
    } finally {
        reader.close();
    }