I'm using the following code to parse a org.w3c.dom.Document
with a javax.xml.parsers.SAXParser
.
try
{
// --- Prepare our SAX parser ---
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser parser = factory.newSAXParser();
// parser.parse(xmlFile, xmlValidator); /* Does not validate unsaved changes */
// --- Create a stream form our already parsed xml document ---
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Source xmlSource = new DOMSource(xmlDocument);
Result outputTarget = new StreamResult(outputStream);
TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);
// --- Validate the xmlDocument ---
parser.parse(new ByteArrayInputStream(outputStream.toByteArray()), xmlValidator);
}
catch (ParserConfigurationException | SAXException | TransformerException | TransformerFactoryConfigurationError | IOException e)
{
e.printStackTrace();
}
When the document is parsed I get the error message
Line 1: Document root element 'MyRootName' must match DOCTYPE root 'null'.
If I just parse the xmlFile
which the xmlDocument
is based on, everything works just fine.
I have ensured that the xmlDocument is initialised and valid, I've even tried passing in xmlDocument.getDocumentElement()
to the DOMSource
which I have also ensured is valid and what I am expecting it to be (i.e. the root node of the document which has the correct name)
Why isn't the javax.xml.parsers.SAXParser
reading the java.io.InputStream
in the same way it is reading the 'xmlFile` from the file system?
related question (I've tried all of these solutions to no avail): how to create an InputStream from a Document or Node
I have found the cause, detailed here: Parsing xml with DOM, DOCTYPE gets erased
So the issue wasn't with the parser
, it was with the Transformer
which was stripping out the <!DOCTYPE ...>
line in the XML. To solve this, simply set a transformer property so it includes the DTD file.
// --- Create a transformer and transform our Document into an InputStream ---
Transformer transformer = TransformerFactory.newInstance().newTransformer();
// By default the transformer strips out the DOCTYPE tag so we must re-add our DTD file declaration
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, xmlFile.getParent() + "\\" + xmlDocument.getDoctype().getSystemId());
transformer.transform(xmlSource, outputTarget);
If you simply pass in the DTD file name, the parser will search for it at the location the program was launched from, it is advisable to specify the direct path to the DTD file, as I have above.