Search code examples
xsltxslt-2.0xslt-grouping

xslt: select unique node via intermediate reference node?


XSLT 2.

Hi, I have an xml that has 3 nodes, named from point of view of the 'children' called: Children, Fathers and MothersFathers. Starting with the Fathers node I need to find the a childs MothersFather node based on the ID's in the Child nodes (the Child node is the intermediate reference joining the other two.)

So, for each Father get his children's distinct MothersFather - these aren't humans, a father could have hundreds of children but only twenty or so of the related MothersFathers :)

Simplified version of XML (in real life have about 80 Father nodes, 3000 Child nodes and 400 MothersFather nodes):

<t>
<Children>
    <Child>
        <ID>1</ID>
        <FathersID>100</FathersID>
        <MothersFatherID>200</MothersFatherID>    
    </Child>
    <Child>
        <ID>2</ID>
        <FathersID>100</FathersID>
        <MothersFatherID>201</MothersFatherID>    
    </Child>
    <Child>
        <ID>3</ID>
        <FathersID>100</FathersID>
        <MothersFatherID>202</MothersFatherID>    
    </Child>
    <Child>
        <ID>4</ID>
        <FathersID>100</FathersID>
        <MothersFatherID>201</MothersFatherID>    
    </Child>
    <Child>
        <ID>5</ID>
        <FathersID>101</FathersID>
        <MothersFatherID>201</MothersFatherID>    
    </Child>
</Children>
<Fathers>
    <Father>
        <ID>100</ID>
    </Father>
    <Father>
        <ID>101</ID>
    </Father>
</Fathers>
<MothersFathers>
    <MothersFather>
        <ID>200</ID>
    </MothersFather>
    <MothersFather>
        <ID>201</ID>
    </MothersFather>
    <MothersFather>
        <ID>202</ID>
    </MothersFather>
</MothersFathers>        
</t>

My xslt looks like:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kFathersChildren" match="Child" use="FathersID"/>

    <xsl:template match="/">
        <xsl:apply-templates select="//Fathers"></xsl:apply-templates>
    </xsl:template>

    <xsl:template match="Fathers">
        <xsl:apply-templates select="Father"></xsl:apply-templates>
    </xsl:template>

    <xsl:template match="Father">
        <xsl:text>&#10;FATHER: ID=</xsl:text><xsl:value-of select="ID"/>
        <!-- Now show all this fathers childrens maternal grandfathers based on the ID in the Child node -->

        <!--TRY 1: this works, as in gets the right nodes, but doesn't do distinct values....--> 
        <xsl:for-each select="key('kFathersChildren', ID)">  <!-- get the fathers children --> 
            <xsl:text>&#10; found child: current MFid=</xsl:text><xsl:value-of select="current()/MothersFatherID"/>
            <xsl:text> ID=</xsl:text><xsl:value-of select="ID"/>
            <xsl:apply-templates select="//MothersFathers/MothersFather[ID=current()/MothersFatherID]"></xsl:apply-templates>
        </xsl:for-each>

        <!-- *** THIS IS WHERE I GET LOST??? - Do the same thing but only get distinct MothersFatherID's... -->

        <!--TRY 2: note- won't compile in current state... -->
        <xsl:for-each select="distinct-values(key('kFathersChildren', ID)[MothersFatherID])">  
            <xsl:text>&#10;  Distinct MothersFatherID ???? - don't know what to select </xsl:text><xsl:value-of select="."/>
            <xsl:apply-templates select="//MothersFathers/MothersFather[ID=??????????"></xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>

    <xsl:template match="//MothersFathers/MothersFather">
        <xsl:text>&#10;      IN MothersFather template... ID=</xsl:text><xsl:value-of select="ID"/>
    </xsl:template>
</xsl:stylesheet>

In Try 1 I can get all the nodes and MothersFatherID's. The output of Try1 is:

FATHER: ID=100
 found child: current MFid=200 ID=1
      IN MothersFather template... ID=200
 found child: current MFid=201 ID=2
      IN MothersFather template... ID=201
 found child: current MFid=202 ID=3
      IN MothersFather template... ID=202
 found child: current MFid=201 ID=4
      IN MothersFather template... ID=201
FATHER: ID=101
 found child: current MFid=201 ID=5
      IN MothersFather template... ID=201

In Try2 where I'm selecting 'distinct-value' I would like output like:

FATHER: ID=100
      IN MothersFather template... ID=201
      IN MothersFather template... ID=200
      IN MothersFather template... ID=202
FATHER: ID=101
      IN MothersFather template... ID=201

(is not real output - just debug stuff showing I can reference the right nodes).

BUT I can't figure out what I'm meant to use to reference the unique MothersFatherID to pass to the 'apply-templates' call.

No matter what I've tried I get variations on errors like: Required item type of first operand of '/' is node(); supplied value has item type xs:anyAtomicType or Axis step child::element('':MothersFatherID) cannot be used here: the context item is an atomic value. I think they mean I'm trying to select nodes where a string value is used or vice-versa.... maybe my use of distinct-value() function is altogether wrong?

Can anyone shed some light on how to do this please? (I keep hoping this xslt will have some zen moment of enlightenment when I won't get stuck on this sort of thing).

Additionally, once I have that going I'm going to want the MothersFather in a sorted order for each Father - in real xml there is a 'Name' associated with each 'ID' - hopefully the for-each 'sort' statement will be similar reference to what fixes above problem?

Thanks for your time. Bryce.

EDIT:

Wow!! Thank you for your answer Dimitre. I have gone over it and was hoping you might be able to break it down a bit for me as I don't fully grok it? The answer was:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>

 <xsl:key name="kMFByFId" match="MothersFatherID"
          use="../FathersID"/>

 <xsl:key name="kMFById" match="MothersFather" use="ID"/>

 <xsl:key name="ChildByFIdAndMFId" match="Child"
  use="concat(FathersID, '+', MothersFatherID)"/>

 <xsl:template match="Children|MothersFathers|text()"/>

 <xsl:template match="Father">
   Father ID=<xsl:value-of select="ID"/>
  <xsl:apply-templates select=
   "key('kMFById',
         key('kMFByFId', ID)
          [generate-id(..)
          =
           generate-id(key('ChildByFIdAndMFId',
                            concat(../FathersID,'+',.)
                          )[1]
                       )
          ]
        )">
     <xsl:sort select="ID" data-type="number"/>
   </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="MothersFather">
      MothersFather ID=<xsl:value-of select="ID"/>
 </xsl:template>
</xsl:stylesheet>

I get the use of the keys involved.

The line <xsl:template match="Children|MothersFathers|text()"/> - how is this line doing its thing? If I step it through a debugger it just jumps straight past this line. If I comment it out there is lots of superfluous output that I can't see the source of.

And the apply-templates line that gives the MothersFather node <xsl:apply-templates select= "key('kMFById', key('kMFByFId', ID)[generate-id(..) =
generate-id(key('ChildByFIdAndMFId', concat(../FathersID,'+',.))[1] ) ] )">
- I've been trying to break this down on paper to see the magic but not quite getting it. It is something like key('kMFById', key('kMFByFId', ID) means get the matching MothersFather nodes by the current Father ID where [generate-id(..) the generated id of '(dot dot)' - something to do with a parent node? which one? equals the generated id based on ChildByFIdAndMFId key [1] - does this 1 get only the first occurrence of the matching generated id's thereby giving my distinct value?

(This answer by Dimitre is also very similar to JLRishie's answer. His sort appears to work, am I missing something there Dimitre?)

Regards, Bryce.


Solution

  • This transformation -- shorter and well formatted and readable without horizontal/vertical scrolling. Also, it applies sorting correctly, unlike other answers:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="text"/>
    
     <xsl:key name="kMFByFId" match="MothersFatherID"
              use="../FathersID"/>
    
     <xsl:key name="kMFById" match="MothersFather" use="ID"/>
    
     <xsl:key name="ChildByFIdAndMFId" match="Child"
      use="concat(FathersID, '+', MothersFatherID)"/>
    
     <xsl:template match="Children|MothersFathers|text()"/>
    
     <xsl:template match="Father">
       Father ID=<xsl:value-of select="ID"/>
      <xsl:apply-templates select=
       "key('kMFById',
             key('kMFByFId', ID)
              [generate-id(..)
              =
               generate-id(key('ChildByFIdAndMFId',
                                concat(../FathersID,'+',.)
                              )[1]
                           )
              ]
            )">
         <xsl:sort select="ID" data-type="number"/>
       </xsl:apply-templates>
     </xsl:template>
    
     <xsl:template match="MothersFather">
          MothersFather ID=<xsl:value-of select="ID"/>
     </xsl:template>
    </xsl:stylesheet>
    

    when applied on this XML document (the provided, but a little shuffled to test for correct sorting):

    <t>
    <Children>
        <Child>
            <ID>2</ID>
            <FathersID>100</FathersID>
            <MothersFatherID>201</MothersFatherID>
        </Child>
        <Child>
            <ID>1</ID>
            <FathersID>100</FathersID>
            <MothersFatherID>200</MothersFatherID>
        </Child>
        <Child>
            <ID>3</ID>
            <FathersID>100</FathersID>
            <MothersFatherID>202</MothersFatherID>
        </Child>
        <Child>
            <ID>4</ID>
            <FathersID>100</FathersID>
            <MothersFatherID>201</MothersFatherID>
        </Child>
        <Child>
            <ID>5</ID>
            <FathersID>101</FathersID>
            <MothersFatherID>201</MothersFatherID>
        </Child>
    </Children>
    <Fathers>
        <Father>
            <ID>100</ID>
        </Father>
        <Father>
            <ID>101</ID>
        </Father>
    </Fathers>
    <MothersFathers>
        <MothersFather>
            <ID>200</ID>
        </MothersFather>
        <MothersFather>
            <ID>201</ID>
        </MothersFather>
        <MothersFather>
            <ID>202</ID>
        </MothersFather>
    </MothersFathers>
    </t>
    

    produces the wanted, correct result:

       Father ID=100
          MothersFather ID=200
          MothersFather ID=201
          MothersFather ID=202
       Father ID=101
          MothersFather ID=201
    

    Do note:

    The transformation is executed correctly both with an XSLT 1.0 and with XSLT 2.0 processor.


    Update:

    The OP has edited the question, asking some questions about this solution:

    I get the use of the keys involved.

    The line <xsl:template match="Children|MothersFathers|text()"/> - how is this line doing its thing? If I step it through a debugger it just jumps straight past this line. If I comment it out there is lots of superfluous output that I can't see the source of.

    You have discovered what this template with empty body is doing -- it prevents the superfluous output from being written. The XSLT processor has a number of built-in templates that are selected for execution when processing a given node -- in case the XSLT transformation doesn't specify a template matching this node.

    The built-in template for any element outputs the concatenation of the string values of all of its text-node-descendants -- and this is exactly what you see as superfluous output.

    To avoid this, I have provided a template matching thode elements. This overrides (suppresses) the built-in template. As this tamplate has no body, no output is produced.

    And the apply-templates line that gives the MothersFather node <xsl:apply-templates select= "key('kMFById', key('kMFByFId', ID)[generate-id(..) = generate-id(key('ChildByFIdAndMFId', concat(../FathersID,'+',.))[1] ) ] )"> - I've been trying to break this down on paper to see the magic but not quite getting it. It is something like key('kMFById', key('kMFByFId', ID) means get the matching MothersFather nodes by the current Father ID where [generate-id(..) the generated id of '(dot dot)' - something to do with a parent node? which one? equals the generated id based on ChildByFIdAndMFId key [1] - does this 1 get only the first occurrence of the matching generated id's thereby giving my distinct value?

    Your question is about this code fragment:

      <xsl:apply-templates select=
       "key('kMFById',
             key('kMFByFId', ID)
              [generate-id(..)
              =
               generate-id(key('ChildByFIdAndMFId',
                                concat(../FathersID,'+',.)
                              )[1]
                           )
              ]
            )">
         <xsl:sort select="ID" data-type="number"/>
       </xsl:apply-templates>
    

    In order to understand what is going on here, you need to get acquainted with the Muenchian Grouping Method.

    What essentially the above code fragment is saying is:

    process all MothersFather elements that are the first such element that is a sibling of a FathersID that has the same value as the ID of the current (Father) node.