Search code examples
xmlxsltwhitespacexalan

Preserving whitespaces from the XSL stylesheet


I'm trying to convert this XML :-

<list>
  <unit>
    <data1>a</data1>
    <data2>b</data2>
    <data3>c</data3>
  </unit>
</list>

to this :-

<list>
  <unit>
    <category1>
      <data1>a</data1>
      <data2>b</data2>
    </category1>
    <category2>
      <data3>c</data3>
    </category2>
  </unit>
</list>

using XSL. I'm using the following XSL:-

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:s="some_namespace">


<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="//s:unit" xml:space="preserve">
  <xsl:copy>
  <category1>
    <xsl:apply-templates select="./s:data1"/>
    <xsl:apply-templates select="./s:data2"/>
  </category1>
  <category2>
    <xsl:apply-templates select="./s:data3"/>
  </category2>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Now, this preserves the indentation within but completely messes it up w.r.t. list. This is what I get :-

  <list>
<unit>
  <category1>
    <data1>a</data1>
    <data2>b</data2>
  </category1>
  <category2>
    <data3>c</data3>
  </category2>
</unit>
  </list>

What am I missing here?


Solution

  • What am I missing here?

    I think that one shouldn't be messing with the default indentation of the XSLT processor.

    Most often the combination of <xsl:output indent="yes"/> and <xsl:strip-space elements="*"/> is sufficient for getting good indentation.

    This transformation:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>
    
     <xsl:template match="unit">
      <unit>
          <category1>
           <xsl:apply-templates select="*[not(position() >2)]"/>
          </category1>
          <category2>
           <xsl:apply-templates select="*[position() >2]"/>
          </category2>
      </unit>
     </xsl:template>
    </xsl:stylesheet>
    

    when applied on the provided XML document:

    <list>
          <unit>
            <data1>a</data1>
            <data2>b</data2>
            <data3>c</data3>
          </unit>
    </list>
    

    Produces the wanted, well-indented result:

    <list>
      <unit>
        <category1>
          <data1>a</data1>
          <data2>b</data2>
        </category1>
        <category2>
          <data3>c</data3>
        </category2>
      </unit>
    </list>
    

    This same result is produced when the transformation is run with any of the following seven XSLT processors:

    • AltovaXML (XML-SPY).

    • .NET XslCompiledTransform.

    • .NET XslTransform.

    • Saxon 6.5.4.

    • Saxon 9.1.05 (XSLT 2.0 processor).

    • XQSharp/XMLPrime (XSLT 2.0 processor).

    • AltovaXml (for XSLT 2.0).

    The case with MSXML3/4/6 is more complicated -- these XSLT processors' indentation consists just of a new-line character, so every element is on a new line, but appears at the start of the line.

    For these XSLT processors I use the following two-pass processing, the first pass being the above transformation and the second applies to the result of the first pass one of the XML pretty-printers proposed by Nikolai Grigoriev and available in the XSLT FAQ site maintained by Dave Pawson:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:ext="urn:schemas-microsoft-com:xslt"
     exclude-result-prefixes="ext">
     <xsl:output method="xml"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:param name="indent-increment" select="'   '" />
    
     <xsl:variable name="vrtfPass1">
      <xsl:apply-templates select="/*"/>
     </xsl:variable>
    
     <xsl:variable name="vPass1" select="ext:node-set($vrtfPass1)"/>
    
     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>
    
     <xsl:template match="/">
      <xsl:apply-templates select="$vPass1/*" mode="pass2"/>
     </xsl:template>
    
    
     <xsl:template match="unit">
      <unit>
          <category1>
           <xsl:apply-templates select="*[not(position() >2)]"/>
          </category1>
          <category2>
           <xsl:apply-templates select="*[position() >2]"/>
          </category2>
      </unit>
     </xsl:template>
    
      <xsl:template match="*" mode="pass2">
         <xsl:param name="indent" select="'&#xA;'"/>
    
         <xsl:value-of select="$indent"/>
         <xsl:copy>
           <xsl:copy-of select="@*" />
           <xsl:apply-templates mode="pass2">
             <xsl:with-param name="indent"
                  select="concat($indent, $indent-increment)"/>
           </xsl:apply-templates>
           <xsl:value-of select="$indent"/>
         </xsl:copy>
      </xsl:template>
    
      <xsl:template match="comment()|processing-instruction()" mode="pass2">
         <xsl:copy />
      </xsl:template>
    
      <!-- WARNING: this is dangerous. Handle with care -->
      <xsl:template match="text()[normalize-space(.)='']" mode="pass2"/>
    
    </xsl:stylesheet>
    

    When this transformation is performed on the same (provided) XML document (above), the produced result has the desired indentation:

    <?xml version="1.0" encoding="UTF-16"?>
    <list>
       <unit>
          <category1>
             <data1>a
             </data1>
             <data2>b
             </data2>
          </category1>
          <category2>
             <data3>c
             </data3>
          </category2>
       </unit>
    </list>
    

    These are all XSLT processors I have on my computers. I suggest to try the last transformation -- the chances are that it will produce the wanted results with Xalan-C.

    Do note:

    The last transformation uses an MSXML - specific extension function xxx:node-set(), belonging to an MSXML - specific namespace:

    xmlns:ext="urn:schemas-microsoft-com:xslt"
    

    For Xalan this needs to be replaced with:

    xmlns:ext="http://exslt.org/common"
    

    or, in case EXSLT isn't supported, then the native Xalan namespace:

    xmlns:ext="http://xml.apache.org/xalan
    

    In this last case, the call to the ext:node-set() function must be replaced with a call to ext:nodeset() (note the missing dash):

     <xsl:variable name="vPass1" select="ext:nodeset($vrtfPass1)"/>