Search code examples
xmlxsltsaxonxercesxinclude

How to resolve XInclude instructions in a XML file from command line with XSLT 3.0


Our XML data is stored in separate files, so the personnel can work individually on simple modules. The separate files are assembled to one master file to be processed further. Currently I am doing this within the IDE of the Oxygen XML Editor. To streamline the process, I would like to do it from command line without this IDE. How can I resolve the the XInclude statements from command line with Saxon HE (if this is possible)?

I tried a command like this:

java -jar saxon9he.jar -xi:on -s:main.xml -xsl:assemble.xslt -o:master.xml -t

and get the following error code:

Saxon-HE 9.9.1.4J from Saxonica
Java version 1.8.0_191
Stylesheet compilation time: 361.152836ms
Processing file:/u:/Wolke/xml/resolve-xi/main.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for file:/u:/Wolke/xml/resolve-xi/main.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Exception in thread "main" java.lang.StackOverflowError
        at java.security.AccessController.doPrivileged(Native Method)
        at com.sun.org.apache.xerces.internal.utils.SecuritySupport.getContextClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.findClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.newInstance(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.handleIncludeElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.emptyElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
[and many more lines]

Saxonica's documentation on the xi:on parameter says: "Apply XInclude processing to all input XML documents (including schema and stylesheet modules as well as source documents). This currently only works when documents are parsed using the Xerces parser, which is the default in JDK 1.5 and later." (https://www.saxonica.com/documentation9.5/using-xsl/commandline.html) -- not sure, what this means.

Main XML file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt><title>Trying to make XInclude work</title></titleStmt>
            <publicationStmt><p>Sample data for stackoverflow question</p></publicationStmt>
            <sourceDesc><p>Just made up</p></sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file1.xml" xpointer="content-p1"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file2.xml" xpointer="content-p2"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file3.xml" xpointer="content-p3"/>
        </body>
    </text>
</TEI>

XML component files:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
    <text>
        <body>
            <div type="page" xml:id="content-p1">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
        </body>
    </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p2">
            <p>Quisque gravida venenatis varius.</p>
         </div>
      </body>
   </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p3">
            <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
         </div>
      </body>
   </text>
</TEI>

XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

The output I would need (like the Oxygen IDE creates it):

<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt>
                <title>Trying to make XInclude work</title>
            </titleStmt>
            <publicationStmt>
                <p>Sample data for stackoverflow question</p>
            </publicationStmt>
            <sourceDesc>
                <p>Just made up</p>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <div type="page" xml:id="content-p1" xml:base="file1.xml">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
            <div type="page" xml:id="content-p2" xml:base="file2.xml">
                <p>Quisque gravida venenatis varius.</p>
            </div>
            <div type="page" xml:id="content-p3" xml:base="file3.xml">
                <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
            </div>
        </body>
    </text>
</TEI>

Solution

  • Based on our exchange of comments and the advice you got from the oXygen support it looks like using oXygen's patched version of Xerces (available at https://mvnrepository.com/artifact/com.oxygenxml/oxygen-patched-xerces/21.1.0.2) together with Saxon 9.9 HE should work to enable xpointer based XInclude from xml:id attributes:

    java -cp 'oxygen-patched-xerces-21.1.0.2.jar;saxon9he.jar' net.sf.saxon.Transform -t -s:input.xml -xsl:sheet.xsl -xi:on
    

    This is the command line I have used and tested in a Windows 10 Powershell window, depending on the platform and command line shell you might need different quote characters for the -cp argument and a different item separator between differents jar files listed there.