Search code examples
xslthierarchysiblings

XSLT: flat XML to nested hierarchy based on path


I am trying to create a nested hierarchy from flat XML based on level elements that represent a path. Each level element and its belonging siblings (names and number vary) should be wrapped in a 'record' element thus creating a tree structure.

From this source (simplified):

<?xml version="1.0" encoding="UTF-8"?>
    
<record>
    
    <level>first</level>
    
    <unitid>0001</unitid>
    <a-few-more-siblings/>
    
    <level>first/second</level>
    
    <unitid>0002</unitid>
    
    <many-more-siblings/>
    <level>first/second/third</level>
    
    <unitid>0003a</unitid>
    <some-more-siblings/>
    
    <level>first/second/third</level>
    
<unitid>0003b</unitid>
    <many-more-siblings/>
    <level>first/second/third</level>
    
    <unitid>0003c</unitid>
    <some-more-siblings/>
 
    <level>first</level>
    
    <unitid>0004</unitid>

    <again-more-siblings/>
     
</record>

I would like to generate the following desired output:

<Record level="first">

    <level>first</level>
    <unitid>001</unitid>
    <a-few-more-siblings/>
    <Record level="second">

        <level>second</level>
        <unitid>002</unitid>
        <many-more-siblings/>
        <Record level="third">
            <level>third</level>
            <unitid>003a</unitid>
            <some-more-siblings/>
        </Record>
        <Record level="third">

            <level>third</level>
            <unitid>003b</unitid>
            <many-more-siblings/>
        </Record>
        <Record level="third">

            <level>third</level>
            <unitid>003c</unitid>
            <some-more-siblings/>
        </Record>
    </Record>    
</Record>
<Record level="first">
    <level>first</level>
    <unitid>0004</unitid>
    <again-more-siblings/>
</Record>

The closest I could produce so far is:

<record level="first">
   <level>first</level>
   <unitid>0001</unitid>
   <some-other-siblings/>
   <record level="second">
      <level>first/second</level>
      <unitid>0002</unitid>
      <some-other-siblings/>
      <record level="third">
             <level>first/second</level>
             <unitid>0002</unitid>
             <some-other-siblings/>
         <level>first/second/third</level>
         <unitid>0003a</unitid>
         <some-other-siblings/>
      </record>
      <record level="third">
             <level>first/second</level>
             <unitid>0002</unitid>
             <some-other-siblings/>
             <level>first/second/third</level>
             <unitid>0003a</unitid>
             <some-other-siblings/>
         <level>first/second/third</level>
         <unitid>0003b</unitid>
         <some-other-siblings/>
      </record>
      <record level="third">
         <level>first/second/third</level>
         <unitid>0003c</unitid>
         <some-other-siblings/>
      </Record>
   </record>
</record>

(undesirable siblings on third level additionally indented; 0004 on first level fails to appear)

I tried different variations of approaches suggested to similar problems ("flat to hierarchical", "following siblings until", etc.), but end up either stuck with too many siblings printed at a certain position or with the output of only the first record on the third level.

Any help is greatly appreciated.


Solution

  • One way to do this could be to make use of keys. For a start to get the siblings of a level element you could define a key to group elements by the first most preceding level element (i.e the group will be all the siblings).

    <xsl:key name="siblings" 
         match="*[not(self::level)]" 
         use="generate-id(preceding-sibling::level[1])" />
    

    You could also define a key to get the immediate 'descendant' of a level element (i.e for each level, group them by the first most preceding level with a short name).

    <xsl:key name="nextlevel" 
         match="level" 
         use="generate-id(preceding-sibling::level[starts-with(current(), concat(., '/'))][1])" />
    

    In your XSLT you would then start of simply by selecting the 'first' level elements

    <xsl:apply-templates select="level[. = 'first']" />
    

    You would then have a generic template matching level elements where you could utilise both the keys to output the siblings and the next level elements

    <xsl:template match="level">
        <Record level="{.}">
            <xsl:copy-of select="." />
            <xsl:apply-templates select="key('siblings', generate-id())" />
            <xsl:apply-templates select="key('nextlevel', generate-id())" />
        </Record>
    </xsl:template>
    

    Try the following XSLT

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
        <xsl:key name="siblings" match="*[not(self::level)]" use="generate-id(preceding-sibling::level[1])" />
    
        <xsl:key name="nextlevel" match="level" use="generate-id(preceding-sibling::level[starts-with(current(), concat(., '/'))][1])" />
    
        <xsl:template match="record">
            <xsl:apply-templates select="level[. = 'first']" />
        </xsl:template>
    
        <xsl:template match="level">
            <Record level="{.}">
                <xsl:copy-of select="." />
                <xsl:apply-templates select="key('siblings', generate-id())" />
                <xsl:apply-templates select="key('nextlevel', generate-id())" />
            </Record>
        </xsl:template>
    
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    </xsl:stylesheet>
    

    When applied to your XML, the following is output

    <Record level="first">
        <level>first</level>
        <unitid>0001</unitid>
        <a-few-more-siblings/>
        <Record level="first/second">
            <level>first/second</level>
            <unitid>0002</unitid>
            <many-more-siblings/>
            <Record level="first/second/third">
                <level>first/second/third</level>
                <unitid>0003a</unitid>
                <some-more-siblings/>
            </Record>
            <Record level="first/second/third">
                <level>first/second/third</level>
                <unitid>0003b</unitid>
                <many-more-siblings/>
            </Record>
            <Record level="first/second/third">
                <level>first/second/third</level>
                <unitid>0003c</unitid>
                <some-more-siblings/>
            </Record>
        </Record>
    </Record>
    <Record level="first">
        <level>first</level>
        <unitid>0004</unitid>
        <again-more-siblings/>
    </Record>
    

    This isn't quite what you are currently showing as your expected output, because your expected output has two 'first' level elements wrapped in a single Record element (compared with separate Record elements for the 'third' level elements). If your expected output is really what you expect, try replacing the template that matches record with these two templates instead:

    <xsl:template match="record">
        <Record level="first">
            <xsl:apply-templates select="level[. = 'first']" />
         </Record>
    </xsl:template>
    
    <xsl:template match="level[. = 'first']">
        <xsl:copy-of select="." />
        <xsl:apply-templates select="key('siblings', generate-id())" />
        <xsl:apply-templates select="key('nextlevel', generate-id())" />
    </xsl:template>