Search code examples
xmlxsltescapingunescapestring

Disable escaping XML text passed through command line


General overview

I have a part of a XML file that comes from outside the XML (or the xslt itself) cause it's dynamically generated by another process. So I give it through the command line.

The problem

When I pass the XML as parameter, it is in an escaped form, when I try to process it in the xslt side. Instead of angle brackets < and > I get their escaped forms &lt; and &gt;. So I can't process them as XML.

The files and command - XML:

<?xml version="1.0" encoding="UTF-8"?>
<document>
    <h2>General Introduction</h2>
    <h3>Project Overview</h3>
    <h3>Goals and Challenges</h3>
    <h2>Installation</h2>
    <h3>Initial Configuration</h3>
    <options></options>
    <h4>Configuration File</h4>
    <h3>Deployment</h3>
    <h4>On a Local Server</h4>
    <h4>On a Remote Server</h4>
    <h2>Usage</h2>
    <h3>Basic Commands</h3>
    <h3>Advanced Options</h3>
</document>

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:param name="optionsContent"/> <!-- Processing the inputed variable -->

    <xsl:template match="/document">
        <html>
            <body>
                <xsl:apply-templates/> 
                <xsl:apply-templates select="$optionsContent"/> <!-- Call to the variable content with angle bracket -->
            </body>
        </html>
    </xsl:template>

    <xsl:template match="options">
        <h2>Options</h2>
        <p>Some options</p>
    </xsl:template>
</xsl:stylesheet>

The command line:

java -jar /usr/share/java/Saxon-HE-9.9.1.5.jar \
                -s:test.xml \
                -xsl:test.xslt \
                -o:test.html \
                optionsContent='<options></options>' # ← Here is the relevant part

The output:

<html>
   <body>
      General Introduction
      Project Overview
      Goals and Challenges
      Installation
      Initial Configuration
        
      <h2>Options</h2>
      <p>Some options</p>
      Configuration File
      Deployment
      On a Local Server
      On a Remote Server
      Usage
      Basic Commands
      Advanced Options
      &lt;options&gt;&lt;/options&gt;</body> <!-- ← As you see, the brackets desapears and get replaced but their escape code -->
</html>

Question

How would I be able to use the contents of optionsContent as XML and not just as flat text with escaped symbols?


Solution

  • The problem is that you are providing the supplementary XML in the form of a string. In the XSLT stylesheet this parameter is simply a string which happens to include < and > characters. NB when your output document is serialized, the <, >, etc, characters are escaped as &lt; and &gt; at that point, i.e. the escaping is a feature of the serialization.

    What you need to do is to parse that string parameter to produce a tree of XML data model nodes. There's an XPath function parse-xml() that will do that. By contrast, your main source XML document has been parsed implicitly by the XSLT processor, to produce a tree of nodes.

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:param name="optionsContent"/>
    
        <xsl:template match="/document">
            <html>
                <body>
                    <xsl:apply-templates/> 
                    <xsl:apply-templates select="parse-xml($optionsContent)"/>
                </body>
            </html>
        </xsl:template>
    
        <xsl:template match="options">
            <h2>Options</h2>
            <p>Some options</p>
        </xsl:template>
    </xsl:stylesheet>