Search code examples
xsltxslt-1.0xslt-groupingexslt

Grouping a list of items based on a space separated list of tags and else


The problem has multiple facets:

  1. How to categorize based on specific space separated contents of a tag
  2. How to categorize for lack of such specific content.

As an example, take the following data:

<messages>
  <m> 
    <subject>message tagged with A B C</subject>
    <tags>A B C</tags>
  </m>

  <m> 
    <subject>message tagged with B C D</subject>
    <tags>B C D</tags>
  </m>

  <m> 
    <subject>message tagged with X Y A</subject>
    <tags>X Y A</tags>
  </m>

  <m> 
    <subject>message tagged with C X</subject>
    <tags>C X</tags>
  </m>

  <m>
    <subject>message tagged with Y</subject>
    <tags>Y</tags>
  </m>

</messages>

Given a known set of tags, say

<xsl:param name="pKnownTags">
  <t>A</t>
  <t>B</t>
</xsl:param>

I want to generate an output that would look like:

Messages tagged with A:
* message tagged with A B C
* message tagged with X Y A

Messages tagged with B:
* message tagged with A B C
* message tagged with B C D

Messages tagged with neither:
* message tagged with C X
* message tagged with Y 

Using EXSLT is fine, but otherwise need 1.0 solution. Is this possible?


Solution

  • This doesn't require anything too fancy. Please give the below a try:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:exsl="http://exslt.org/common" exclude-result-prefixes="exsl"
    >
      <xsl:output method="text" indent="yes"/>
    
      <xsl:param name="pKnownTags">
        <t>A</t>
        <t>B</t>
      </xsl:param>
      <xsl:variable name="pKnownTagsNodeSet" select="exsl:node-set($pKnownTags)/t" />
    
      <xsl:template match="/messages">
        <xsl:apply-templates select="$pKnownTagsNodeSet">
          <xsl:with-param name="docEl" select="." />
        </xsl:apply-templates>
    
        <xsl:text>Messages tagged with none of the above:&#xA;</xsl:text>
        <xsl:apply-templates select="m" mode="checkAbsence" />
      </xsl:template>
    
      <xsl:template match="t">
        <xsl:param name="docEl" select="/.." />
    
        <xsl:value-of select="concat('Messages tagged with ', ., ':&#xA;')"/>
        <xsl:apply-templates select="$docEl/m[contains(concat(' ', tags, ' '),
                                                       concat(' ', current(), ' '))]" />
        <xsl:text>&#xA;</xsl:text>
      </xsl:template>
    
      <xsl:template match="m" mode="checkAbsence">
        <xsl:variable name="currentTagsPadded" select="concat(' ', tags, ' ')" />
        <xsl:apply-templates
              select="(.)[not($pKnownTagsNodeSet[contains($currentTagsPadded,
                                                          concat(' ', ., ' '))]
                             )
                         ]" />
      </xsl:template>
    
      <xsl:template match="m">
        <xsl:value-of select="concat('* ', subject, '&#xA;')"/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    when run on your sample input, this produces:

    Messages tagged with A:
    * message tagged with A B C
    * message tagged with X Y A
    
    Messages tagged with B:
    * message tagged with A B C
    * message tagged with B C D
    
    Messages tagged with none of the above:
    * message tagged with C X
    * message tagged with Y