Search code examples
xsltxslt-1.0muenchian-groupingxslkey

How can I merge nodes based on the value of an attribute?


After a first xsl transformation I have a xml output similar to the following one:

<?xml version="1.0" encoding="UTF-8"?>
<analysis type="1">
    <file path="a.txt">
        <line nb="23" found="true"/>
        <line nb="36" found="true" count="2"/>
        <line nb="98" found="true"/>
    </file>
    <file path="a.txt">
        <line nb="100" found="false"/>
    </file>
    <file path="b.txt">
        <line nb="10" found="false"/>
    </file>
    <!-- more file nodes below with different @path -->
</analysis>

But now I need to obtain a second output where file nodes are merged if they have the same path attribute as follows:

<?xml version="1.0" encoding="UTF-8"?>
<analysis type="1">
    <file path="a.txt">
        <line nb="23" found="true"/>
        <line nb="36" found="true" count="2"/>
        <line nb="98" found="true"/>
        <line nb="100" found="false"/>
    </file>
    <file path="b.txt">
        <line nb="10" found="false"/>
    </file>
</analysis>

I don't know possible @pathvalues in advance.

I looked at multiple posts about nodes merging but could not find a way to do what I want. I'm lost with nodes grouping, keys, id generation... and only obtained error messages so far.

Could you please help me to get the 2nd output starting from the first one (with xls 1.0) ? And if you could provide some references (websites) where I could find explanations about such kind of transformations it would be really great.

Note : the @nb attribute of two line nodes of two file nodes having the same @path never collide, it is unique, i.e. this will never happen :

<?xml version="1.0" encoding="UTF-8"?>
<analysis type="1">
    <file path="a.txt">
        <line nb="36" found="true" count="2"/>
    </file>
    <file path="a.txt">
        <line nb="36" found="true"/>
    </file>
</analysis>

Thank you a lot for your help !


Solution

  • XPath 1.0 without keys

    Since you state in your question that you have trouble understanding keys, here is one way of doing it without keys, using a technique called sibling recursion. It is considered less good than using keys because it uses a the sibling axis, which is typically quite slow. However, in most practical situations, you will not notice the difference:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="1.0">
    
        <xsl:template match="node() | @*">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="analysis">
            <xsl:copy>
                <xsl:copy-of select="@*" />
                <xsl:apply-templates select="file[not(preceding-sibling::file/@path = @path)]" mode="sibling-recurse" />
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="file" mode="sibling-recurse">
            <xsl:copy>
                <!-- back to default mode -->
                <xsl:apply-templates select="node() | @*" />
                <xsl:apply-templates select="following-sibling::file[current()/@path = @path]" />
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="file">
            <xsl:apply-templates select="node()" />
        </xsl:template>
    </xsl:stylesheet>
    

    XPath 1.0 with keys for Münchian Grouping

    This approach uses Münchian Grouping, which is explained elsewhere (just follow the tutorials like this one with this code in hand). It also uses the sibling axis, but in a far less destructive way (i.e., it is not required to traverse the whole sibling axis on every single node test).

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="1.0">
    
        <xsl:key match="file" use="@path" name="path" />
    
        <xsl:template match="node() | @*">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="analysis">
            <xsl:copy>
                <xsl:copy-of select="@*" />
                <xsl:apply-templates select="file[generate-id(.) = generate-id(key('path', @path))]" mode="sibling-recurse" />
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="file" mode="sibling-recurse">
            <xsl:copy>
                <!-- back to default mode -->
                <xsl:apply-templates select="node() | @*" />
                <xsl:apply-templates select="following-sibling::file[@path = current()/@path]/node()" />
            </xsl:copy>
        </xsl:template>
    
    </xsl:stylesheet>
    

    Note: for both approaches, the mode-switching is not entirely necessary, but it makes it easier to write simple match patterns and prevents priority conflicts or hard-to-find bugs (imo).