I work with PTC Arbortext Editor which was written originally in the pre-XML (SGML) days of the late 1980s. A Java application uses org.custommonkey.xmlunit to diff XML files.
The diff tool fails to parse files where the files expect (on Windows) a semi-colon-separated list of absolute paths to various catalog file locations wherein it looks for catalog
and/or catalog.xml
files. These may use the CATALOG
directive. There is use of PUBLIC
identifier mapped to paths that are relative to the particular catalog file.
I am parsing XML using this catalog info which may contain file entities as well as XML inclusions.
For some use cases, I can set validating false
and that works (it is reasonable to assume the two files are valid) but for some files I have to read the catalog info to resolve file entities in the XML.
I can ask the user to provide a list of absolute paths to their top-level catalog locations. However I am rather lost selecting a resolver and integrating it into my code.
I am using Java 1.8 but don't mind going to 10 if that would help/simplify. It looks like 9 had some simple support with javax.xml.catalog but isn't in 1.8 or 10.
I can provide my parsing code if that matters, but I'm not stuck on any one parser.
My code is below. I switched from LSParser
to DocumentBuilder
for the sake of setValidating(false)
.
Here are a couple excerpts from one of the files I'd like to be able to work with:
<?xml version="1.0" encoding="UTF-8"?>
<!--Arbortext, Inc., 1988-2016, v.4002-->
<!DOCTYPE Composer PUBLIC "-//Arbortext//DTD Composer 1.0//EN"
"../doctypes/composer/composer.dtd" [
<!ENTITY % stock PUBLIC "-//Arbortext//DTD Fragment - ATI Stock filter list//EN" "../composer/stock.ent">
%stock;
]>
<?Pub Inc?>
<Composer>
<Label>Compose to PDF</Label>
. . .
<Resource>
<Label></Label>
<Documentation></Documentation>&epicGenerator;
&fileSerializer;
&serverProfiler;
&clientProfiler;
&xslTransformer;
&epicSerializer;
&switch;
&errorHandler;
&namespaceFixer;
&atiEventConverter;
&foPropagator;
&extensionHandler;
&ditaPostProcessor;
&ditaStyledElementsTranslator;
&atictFilter;
&applicabilityFilter;
</Resource>
And here are a few lines from one of the catalog files I need to reference:
PUBLIC "-//Arbortext//ENTITIES SAX Event Upstream Loop//EN" "upstreamLoop.ent"
PUBLIC "-//Arbortext//ENTITIES keyRef Resolver//EN" "keyRefResolver.ent"
PUBLIC "-//Arbortext//ENTITIES ATI Change Tracking Filter 1.0//EN" "atictFilter.ent"
PUBLIC "-//Arbortext//ENTITIES Font Filter 1.0//EN" "fontFilter.ent"
PUBLIC "-//Arbortext//ENTITIES Simple Attribute Cascader//EN" "simpleAttrCascader.ent"
I also looked at Validate XML using XSD, a Catalog Resolver, and JAXP DOM for XSLT. I feel like it is unlikely to solve my problem, but could be wrong.
I also reviewed the following web sites:
I have uploaded Java code, directory structure, and XML to http://aapro.net/CatalogTest.zip
It should be possible to add something to my program which accepts a path to the Test/doctypes folder (the folder, not the catalog file therein), and then the CatalogTest.xml file should parse successfully with the "Validate" option the program prompts for. Other (expensive) SGML/XML-aware software can do so. The catalog resolver, once given the absolute path to the Test/doctypes folder, should be able to follow the CATALOG directive in the Test/doctypes/catalog file to the Test/other/forms/catalog file, to the Test/other/forms/forms.dtd. The parser should be able to parse Test/other/forms/forms.dtd and use it to validate Test/CatalogTest.xml.
Really, this whole process should be able to handle such catalog files OR catalog.xml files, and should be able to parse DTD or XSD files, and SGML or XML instances. But I don't actually care about SGML too much; there only a few milspec situations still around that use that in my working environment.
I'd be willing to try more than one resolver and/or parser, or let the user make the selection.
(Also in the aforementioned zip file)
import java.io.File;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
public class ParseXmlWithCatalog {
public static void main(String[] args) {
int validating = JOptionPane.showOptionDialog(null, "Do you want validation?", "Please choose \"Yes\" for validation",
JOptionPane.YES_NO_OPTION, JOptionPane.QUESTION_MESSAGE, null, null, JOptionPane.YES_OPTION);
parseDoc(getFile(args), validating == JOptionPane.YES_OPTION);
}
private static boolean parseDoc(File inFile, boolean validate) {
if (inFile == null) {
JOptionPane.showMessageDialog(null, "Failure opening input XML.");
}
try {
/*
System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSParser builder = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
LSParserFilter filter = new InputFilter();
builder.setFilter(filter);
*/
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
if (!validate) {
builderFactory.setValidating(false);
builderFactory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
}
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document testDoc = builder.parse(inFile.getPath());
System.out.println(testDoc.getFirstChild().getNodeName());
} catch (Exception exc) {
JOptionPane.showMessageDialog(null, "Failure parsing input XML: " + exc.getMessage());
return false;
}
return true;
}
public static File getFile(String[] args) {
if (args.length > 1) {
JOptionPane.showMessageDialog(null, "Too many arguments.");
return null;
}
if (args.length == 1) {
return new File(args[0]);
}
JFileChooser fileChooser = new JFileChooser();
fileChooser.setMultiSelectionEnabled(false);
fileChooser.setDialogTitle("Select 1 XML file");
FileNameExtensionFilter filter = new FileNameExtensionFilter("XML Files", "xml", "ditamap", "dita", "style");
fileChooser.setFileFilter(filter);
int response = fileChooser.showOpenDialog(null);
if (response != JFileChooser.APPROVE_OPTION) {
// aborted
return null;
}
return fileChooser.getSelectedFile();
}
}
The Apache XML Commons Resolver supports both OASIS XML Catalogs and the older OASIS TR9401 Catalogs format. See https://xerces.apache.org/xml-commons/components/resolver/.
To enable catalog lookup in your test project, do as follows:
Download XML Commons Resolver from http://xerces.apache.org/mirrors.cgi#binary.
Extract resolver.jar and add it to your classpath.
Create a text file called CatalogManager.properties and put it on your classpath. In this file, add the path to the catalog(s):
catalogs=./doctypes/catalog
The locations of catalog files can also be specifed via the xml.catalog.files
Java system property.
In ParseXmlWithCatalog.java, add an import
statement and create an instance of CatalogResolver
. Set that instance as the parser's EntityResolver
:
import org.apache.xml.resolver.tools.CatalogResolver;
...
CatalogResolver cr = new CatalogResolver();
builder.setEntityResolver(cr);