Search code examples
xmlxsltxslt-1.0

Duplicate structures and substructures simultaneously


My aim is to use XSLT 1.0 to transform an XML file so that it duplicates specific nodes. The problem is, that I need to duplicate substructures of structures that also need to be duplicated.

Here the source XML:

<?xml version="1.0" encoding="UTF-8"?>
<parent_node>
    <derived_from_a>
        <element1>test</element1>
        <element2>test</element2>
    </derived_from_a>
    <derived_from_b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
            <element3>test</element3>
            <element4>test</element4>
        </derived_from_c>
    </derived_from_b>
</parent_node>

I need all the <derived_from_a> nodes to be duplicated into a new node with the name <a>. The same should happen with the nodes <derived_from_b> and <derived_from_c> for the new nodes <b> and <c>.

This is the desired output:

<?xml version="1.0" encoding="UTF-8"?>
<parent_node>
    <derived_from_a>
        <element1>test</element1>
        <element2>test</element2>
    </derived_from_a>
    <a>
        <element1>test</element1>
        <element2>test</element2>
    </a>
    <derived_from_b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
            <element3>test</element3>
            <element4>test</element4>
        </derived_from_c>
        <c>
            <element3>test</element3>
            <element4>test</element4>
        </c>
    </derived_from_b>
    <b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
            <element3>test</element3>
            <element4>test</element4>
        </derived_from_c>
        <c>
            <element3>test</element3>
            <element4>test</element4>
        </c>
    </b>
</parent_node>

Note that I am duplicating the node <derived_from_c> into <c> in both parent structures (<derived_from_b> and <b>). That is what I need.

I wrote the following XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <!-- -->
    <xsl:template match="//derived_from_a">
        <!-- first copy structure with original name -->
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
        <!-- then duplicate it -->
        <a>
            <xsl:copy-of select="child::node()"/>
        </a>
    </xsl:template>
    <xsl:template match="//derived_from_b">
        <!-- first copy structure with original name -->
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
        <!-- then duplicate it -->
        <b>
            <xsl:copy-of select="child::node()"/>
        </b>
    </xsl:template>
    <xsl:template match="//derived_from_c">
        <!-- first copy structure with original name -->
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
        <!-- then duplicate it -->
        <c>
            <xsl:copy-of select="child::node()"/>
        </c>
    </xsl:template>
</xsl:stylesheet>

This generates this output:

<?xml version="1.0" encoding="UTF-8"?>
<parent_node>
    <derived_from_a>
        <element1>test</element1>
        <element2>test</element2>
    </derived_from_a>
    <a>
        <element1>test</element1>
        <element2>test</element2>
    </a>
    <derived_from_b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
            <element3>test</element3>
            <element4>test</element4>
        </derived_from_c>
        <c>
            <element3>test</element3>
            <element4>test</element4>
        </c>
    </derived_from_b>
    <b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
            <element3>test</element3>
            <element4>test</element4>
        </derived_from_c>
    </b>
</parent_node>

Note that the node <c> only exists in the node <derived_from_b>. Not in the node <b>. I have tried moving the <derived_from_c> transformation to the top of my XSLT without success.

Is there a way I can duplicate everything in one mapping?


Solution

  • This XSLT 1.0 transformation gives the result you are looking for:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:strip-space elements="*" />
      <xsl:output method="xml" indent="yes" />
    
      <xsl:template match="node() | @*" name="identity">
        <xsl:copy>
          <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="*[starts-with(name(), 'derived_from_')]">
        <xsl:call-template name="identity" />
        <xsl:element name="{substring-after(name(), 'derived_from_')}">
          <xsl:apply-templates select="node() | @*" />
        </xsl:element>
      </xsl:template>
    </xsl:stylesheet>
    

    namely

    <parent_node>
      <derived_from_a>
        <element1>test</element1>
        <element2>test</element2>
      </derived_from_a>
      <a>
        <element1>test</element1>
        <element2>test</element2>
      </a>
      <derived_from_b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
          <element3>test</element3>
          <element4>test</element4>
        </derived_from_c>
      </derived_from_b>
      <b>
        <element1>test</element1>
        <element2>test</element2>
        <derived_from_c>
          <element3>test</element3>
          <element4>test</element4>
        </derived_from_c>
        <c>
          <element3>test</element3>
          <element4>test</element4>
        </c>
      </b>
    </parent_node>
    

    The trick here is to give the identity template both a match and a name. This way it can be invoked both "pull-style" by the XSLT processor as it traverses your input document, and manually through <xsl:call-template>.

    The manual invoke enables us to process the descendants of the current node instead of just copying them, which addresses your requirement "to duplicate substructures of structures that also need to be duplicated."


    If your output element names (<a>) are not actually substrings of your input element names (<derived_from_a>), you could be tempted to duplicate the worker template.

    In order to avoid that duplication, you could embed a mapping into your XSLT in a custom element, and use a self-reference via document('') to pull out the new name dynamically:

    <xsl:stylesheet version="1.0" 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:my="http://tempuri.org"
    >
      <my:map>
        <name from="derived_from_a" to="x" />
        <name from="derived_from_b" to="y" />
        <name from="derived_from_c" to="z" />
      </my:map>
    
      <!-- ... -->
       
      <xsl:template match="derived_from_a|derived_from_b|derived_from_c">
        <!-- ... -->
        <xsl:element name="{document('')/*/my:map/name[@from = name(current())]/@to}">
          <!-- ... -->
        </xsl:element>
      </xsl:template>
    </xsl:stylesheet>
    

    Adding configuration with custom elements can be quite useful at times. Another option would be to pass in the mapping through an <xsl:param> from the calling code, depending on the capabilities of the XSLT tooling you're using. The third option would be to use an <xsl:variable> containing an <xsl:choose> to determine the new element name.