Search code examples
xslt-2.0xslt-grouping

How to keep order of elements in first file when merging two xml files using xstl 2.0


I have two XML files which have a header, section and a trailer. The section itself has section header, section details and section trailer. I need to merge the two files at two levels - first at section level and then at section detail level. I want my results to be based on the first file (Headers and trailers will come from the first file). If the section matches, I need to keep the ordering of section details from the first file (there is no sort key for ordering, just the order of occurence). If section is not there in the first file, I need to add the whole section from the second file.

I have the xsl which gives me the results but the ordering is not correct. I need help on how to order them. I did not try key lookup as I was not sure how to account for sections that are not there in the first file. When the SectionDetails match, I need the records from first file to appear before the records from second file.

My first file, FileA is here

<FileRecord>
    <HeaderRecord>
        <A>FileA</A>
    </HeaderRecord>
    <SectionRecord Subject="Science">
        <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
        <SectionDetails Stream="Physics">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Chemistry">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Biology">
            <A>FileA</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
    </SectionRecord>
    <SectionRecord Subject="Math">
        <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
        <SectionDetails Stream="Algebra">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Calculus">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Geometry">
            <A>FileA</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
    </SectionRecord>
    <TrailerRecord>
        <A>FileA</A>
    </TrailerRecord>
</FileRecord>

The second file FileB is here

<FileRecord>
    <HeaderRecord>
        <A>FileB</A>
    </HeaderRecord>
    <SectionRecord Subject="Science">
        <SectionHeader>
            <A>FileB</A>
        </SectionHeader>
        <SectionDetails Stream="Chemistry">
            <A>FileB</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileB</A>
        </SectionTrailer>
    </SectionRecord>
    <SectionRecord Subject="Math">
        <SectionHeader>
            <A>FileB</A>
        </SectionHeader>
        <SectionDetails Stream="Geometry">
            <A>FileB</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileB</A>
        </SectionTrailer>
    </SectionRecord>
    <SectionRecord Subject="History">
        <SectionHeader>
            <A>FileB</A>
        </SectionHeader>
        <SectionDetails Stream="Ancient">
            <A>FileB</A>
        </SectionDetails>
        <SectionDetails Stream="Modern">
            <A>FileB</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileB</A>
        </SectionTrailer>
    </SectionRecord>
    <TrailerRecord>
        <A>FileB</A>
    </TrailerRecord>
</FileRecord>

And the xsl I am using is here

<xsl:stylesheet version="2.0"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                exclude-result-prefixes="xsd xsi xsl"
>
    <xsl:param name="filebrecs" select="document('FileB.xml')"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/">
        <xsl:apply-templates select="FileRecord"/>
    </xsl:template>

    <xsl:template match="FileRecord">
        <FileRecord>
            <xsl:apply-templates select="HeaderRecord"/>
            <xsl:for-each-group select="SectionRecord, $filebrecs/FileRecord/SectionRecord" group-by="@Subject">
                <SectionRecord>
                    <xsl:attribute name="Subject"><xsl:value-of select="current-grouping-key()"/> </xsl:attribute>
                    <xsl:apply-templates select="current-group()[1]/SectionHeader"/>
                    <xsl:for-each-group select="current-group()//SectionDetails" group-by="@Stream">
                        <xsl:for-each select="current-group()">
                            <xsl:apply-templates select="."/>
                        </xsl:for-each>
                    </xsl:for-each-group>
                    <xsl:apply-templates select="current-group()[1]/SectionTrailer"/>
                </SectionRecord>
            </xsl:for-each-group>
            <xsl:apply-templates select="TrailerRecord"/>
        </FileRecord>
    </xsl:template>

</xsl:stylesheet>

I am expecting result like this

<FileRecord>
    <HeaderRecord>
        <A>FileA</A>
    </HeaderRecord>
    <SectionRecord Subject="Science">
        <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
        <SectionDetails Stream="Physics">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Chemistry">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Chemistry">
            <A>FileB</A>
        </SectionDetails>
        <SectionDetails Stream="Biology">
            <A>FileA</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
    </SectionRecord>
    <SectionRecord Subject="Math">
        <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
        <SectionDetails Stream="Algebra">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Calculus">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Geometry">
            <A>FileA</A>
        </SectionDetails>
        <SectionDetails Stream="Geometry">
            <A>FileB</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
    </SectionRecord>
    <SectionRecord Subject="History">
        <SectionHeader>
            <A>FileB</A>
        </SectionHeader>
        <SectionDetails Stream="Ancient">
            <A>FileB</A>
        </SectionDetails>
        <SectionDetails Stream="Modern">
            <A>FileB</A>
        </SectionDetails>
        <SectionTrailer>
            <A>FileB</A>
        </SectionTrailer>
    </SectionRecord>

    <TrailerRecord>
        <A>FileA</A>
    </TrailerRecord>
</FileRecord>

The actual result that I am getting is

<?xml version = '1.0' encoding = 'UTF-8'?>
<FileRecord>
   <HeaderRecord>
        <A>FileA</A>
    </HeaderRecord>
   <SectionRecord Subject="Science">
      <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
      <SectionDetails Stream="Physics">
            <A>FileA</A>
        </SectionDetails>
      <SectionDetails Stream="Chemistry">
            <A>FileB</A>
        </SectionDetails>
      <SectionDetails Stream="Chemistry">
            <A>FileA</A>
        </SectionDetails>
      <SectionDetails Stream="Biology">
            <A>FileA</A>
        </SectionDetails>
      <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
   </SectionRecord>
   <SectionRecord Subject="Math">
      <SectionHeader>
            <A>FileA</A>
        </SectionHeader>
      <SectionDetails Stream="Geometry">
            <A>FileB</A>
        </SectionDetails>
      <SectionDetails Stream="Geometry">
            <A>FileA</A>
        </SectionDetails>
      <SectionDetails Stream="Algebra">
            <A>FileA</A>
        </SectionDetails>
      <SectionDetails Stream="Calculus">
            <A>FileA</A>
        </SectionDetails>
      <SectionTrailer>
            <A>FileA</A>
        </SectionTrailer>
   </SectionRecord>
   <SectionRecord Subject="History">
      <SectionHeader>
            <A>FileB</A>
        </SectionHeader>
      <SectionDetails Stream="Ancient">
            <A>FileB</A>
        </SectionDetails>
      <SectionDetails Stream="Modern">
            <A>FileB</A>
        </SectionDetails>
      <SectionTrailer>
            <A>FileB</A>
        </SectionTrailer>
   </SectionRecord>
   <TrailerRecord>
        <A>FileA</A>
    </TrailerRecord>
</FileRecord>

In the Section Subject=Science, the ordering of Physics, Chemistry and Biology came out correct, but I want FileA record to appear before FileB Record. In the Section record for Math, Geometry showed up before Algebra and Calculus. I want it to appear in the order of FileA (and FileA record to appear before FileB record). Why did it mess up the ordering on the Math but not on Science?

Also I don't like the use of hard coded number to access the first file records <xsl:apply-templates select="current-group()[1]/SectionHeader"/>

Is there a better way of doing it.


Solution

  • Try changing

             <xsl:for-each-group select="current-group()//SectionDetails" group-by="@Stream">
                        <xsl:for-each select="current-group()">
                            <xsl:apply-templates select="."/>
                        </xsl:for-each>
             </xsl:for-each-group>
    

    to

             <xsl:for-each-group select="for $rec in current-group() return $rec/SectionDetails" group-by="@Stream">
                        <xsl:apply-templates select="current-group()"/>
             </xsl:for-each-group> 
    

    Using a for return expression should preserve the order of the outer population when you deal with nodes from different documents while it is undefined and unpredictable if you use current-group()//SectionDetails.

    As for simplifying <xsl:apply-templates select="current-group()[1]/SectionHeader"/>, inside of a for-each-group the first item in each group is the context item so instead of current-group()[1] you can simply use . e.g. ./SectionHeader which of course can be shortened to SectionHeader i.e. <xsl:apply-templates select="SectionHeader"/>

    Not sure however why you use XSLT 2 constructs like for-each-group and in the comment to the other answer then mention Xalan which, being an XSLT 1 processor, does not support for-each-group.