Search code examples
attributestokenxslt-2.0tokenize

What's the best way to move space-delimited tokens from one attribute to another in XSLT-2.0?


I'm trying to move space-delimited tokens from one attribute to another in XSLT-2.0. For example, given

<!-- SOURCE DOCUMENT -->
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p class="foo"/>
    <p class="foo bar baz"/>
    <p class="foo bar baz" outputclass="BAR"/>
    <p class="foo bar baz" outputclass="BAR HELLO"/>
</root>

I need to move @class="foo" to @outputclass="FOO" and @class="bar" to @outputclass="BAR", deleting the source attribute if it becomes empty and augmenting the target attribute if it exists (simple token-set operations):

<!-- RESULTING DOCUMENT -->
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p             outputclass="FOO"/>
    <p class="baz" outputclass="FOO BAR"/>
    <p class="baz" outputclass="FOO BAR"/>
    <p class="baz" outputclass="FOO BAR HELLO"/>
</root>

I think I have everything figured out except the actual token-moving part. Every direction I go down ends up complicated and broken, and I feel like XSLT-2.0 surely has a simple approach that I'm missing.

Here's what I have so far:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:mine="mine:local"
    exclude-result-prefixes="xs"
    version="2.0">

    <!-- baseline identity transform -->
    <!-- (for non-elements - attributes, whitespace PCDATA, etc.)  -->
    <xsl:template match="@*|(node() except *)">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- for element nodes, remap attributes then copy element -->
    <xsl:template match="*">
        <!-- get original attribute sequence -->
        <xsl:variable name="atts1" select="@*"/>

        <!-- use our function to remap two attribute tokens -->
        <xsl:variable name="atts2" select="mine:remap($atts1, 'class', 'foo', 'outputclass', 'FOO')"/>
        <xsl:variable name="atts3" select="mine:remap($atts2, 'class', 'bar', 'outputclass', 'BAR')"/>

        <!-- stuff updated attribute sequence into element -->
        <xsl:copy>
            <xsl:sequence select="$atts3"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- remap  @from_att~="$from_token"  to  @to_att~="$to_token" -->
    <xsl:function name="mine:remap">
        <xsl:param name="orig_atts"/>
        <xsl:param name="from_att"/>
        <xsl:param name="from_token"/>
        <xsl:param name="to_att"/>
        <xsl:param name="to_token"/>

        <!-- ******** TOKEN-MOVING MAGIC!?! ******** -->

        <xsl:sequence select="$orig_atts"/>
    </xsl:function>
</xsl:stylesheet>

Basically I need to figure out how TOKEN-MOVING MAGIC!?! can move a single token (including deletion of empty "from" attributes). I've searched quite a bit but I haven't seen this particular problem covered.

Edit: The number and names of attributes to remap can be anything, and their values are case-sensitive. It's the magic inside the mine:remap function to remap a single value in an attribute sequence that I'm looking for.

Edit: The reason for approaching attribute modification with a function is that we have a number of different token remappings to apply to different files, and I hoped to allow our non-XSLT-savvy users to easily adjust the remappings to their needs. I was unable to figure out how to provide similar generalization with a template-matching-based approach.

Thanks!


Solution

  • Here is a short XSLT 2.0 solution (just 26 lines):

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
      <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
      
      <xsl:template match="p/@class[tokenize(., ' ') = ('foo', 'bar')]">
        <xsl:if test="tokenize(., ' ')[not(. = ('foo', 'bar'))]">
            <xsl:attribute name="class" 
                 select="string-join(tokenize(., ' ')[not(. = ('foo', 'bar'))], ' ')"/>
        </xsl:if>
        <xsl:attribute name="outputclass" select=
          "upper-case(string-join(
                       (
                        tokenize(., ' ')[. = ('foo', 'bar')],
                        tokenize(../@outputclass, ' ')
                                     [not(lower-case(.) = tokenize(current(), ' '))]
                        ),
                        ' '
                                  )
                      )"/>
      </xsl:template>
      
      <xsl:template match="p/@outputclass[../@class[tokenize(., ' ') = ('foo', 'bar')]]"/>
    </xsl:stylesheet>
    

    When this transformation is applied on the provided XML document:

    <root>
        <p class="foo"/>
        <p class="foo bar baz"/>
        <p class="foo bar baz" outputclass="BAR"/>
        <p class="foo bar baz" outputclass="BAR HELLO"/>
    </root>
    

    the wanted, correct result is produced:

    <root>
        <p outputclass="FOO"/>
        <p class="baz" outputclass="FOO BAR"/>
        <p class="baz" outputclass="FOO BAR"/>
        <p class="baz" outputclass="FOO BAR HELLO"/>
    </root>
    

    Update:

    Here is the same transformation with almost everything parameterized, as requested in a comment by the OP, just 32 lines:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:param name="pfromName" select="'class'"/>
     <xsl:param name="ptoName" select="'outputclass'"/>
     <xsl:param name="pTokens" select="'foo', 'bar'"/>
     <xsl:param name="pnewNames" select="'FOO', 'BAR'"/>
    
      <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="p/@*[name() = $pfromName][tokenize(., ' ') = $pTokens]">
        <xsl:if test="tokenize(., ' ')[not(. = $pTokens)]">
            <xsl:attribute name="{$pfromName}"
                 select="string-join(tokenize(., ' ')[not(. = $pTokens)], ' ')"/>
        </xsl:if>
        <xsl:attribute name="{$ptoName}" select=
          "upper-case(string-join(
                       (
                        tokenize(., ' ')[. = $pTokens],
                        tokenize(../@*[name()=$ptoName], ' ')
                                     [not(lower-case(.) = tokenize(current(), ' '))]
                        ),
                        ' '
                                  )
                      )"/>
      </xsl:template>
    
      <xsl:template 
        match="p/@*[name()=$ptoName][../@*[name()=$pfromName][tokenize(., ' ') = $pTokens]]"/>
    </xsl:stylesheet>
    

    Update2:

    Here is a completely parameterized XSLT 2.0 transformation (not using the upper-case() and lower-case() functions), just 37 lines:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:param name="pfromName" select="'class'"/>
     <xsl:param name="ptoName" select="'outputclass'"/>
     <xsl:param name="pTokens" select="'foo', 'bar'"/>
     <xsl:param name="pnewNames" select="'FOO', 'BAR'"/>
    
      <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="p/@*[name() = $pfromName][tokenize(., ' ') = $pTokens]">
        <xsl:if test="tokenize(., ' ')[not(. = $pTokens)]">
            <xsl:attribute name="{$pfromName}"
                 select="string-join(tokenize(., ' ')[not(. = $pTokens)], ' ')"/>
        </xsl:if>
        <xsl:attribute name="{$ptoName}" select=
          "string-join(
                       distinct-values(
                                (for $token in tokenize(., ' ')[. = $pTokens],
                                        $n in 1 to count($pTokens),
                                        $ind in $n[$token eq $pTokens[$n]]
                                      return $pnewNames[$ind]
                                 ,
                                  tokenize(../@*[name()=$ptoName], ' ')
                                  )
                                        ),
                        ' '
                        )
                      "/>
      </xsl:template>
    
      <xsl:template
      match="p/@*[name()=$ptoName][../@*[name()=$pfromName][tokenize(., ' ') = $pTokens]]"/>
    </xsl:stylesheet>