Using XSLT 1.0, given a string with arbitrary characters how can I get back a string that meets the following rules.
In an XSLT I'm translating some attributes into elements, but I need to be sure the attribute doesn't contain any values that can't be used in an element name. I don't care much about the integrity of the attribute being converted to the name as long as it's being converted predictably. I also don't need to compensate for every valid character in an element name (there's a bunch).
The problem I was having was with the attributes having spaces coming in, which the translate function can easily convert to underscores:
translate(@name,' ','_')
But soon after I found some of the attributes using slashes, so I have to add that now too. This will quickly get out of hand. I want to be able to define a whitelist of allowed characters, and replace any non-allowed characters with an underscore, but translate works as by replacing from a blacklist.
You could write a recursive template to do this, working through the characters in the string one by one, testing them and changing them if necessary. Something like:
<xsl:template name="normalizeName">
<xsl:param name="name" />
<xsl:param name="isFirst" select="true()" />
<xsl:if test="$name != ''">
<xsl:variable name="first" select="substring($name, 1, 1)" />
<xsl:variable name="rest" select="substring($name, 2)" />
<xsl:choose>
<xsl:when test="contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:_', $first) or
(not($first) and contains('0123456789.-', $first))">
<xsl:value-of select="$first" />
</xsl:when>
<xsl:otherwise>
<xsl:text>_</xsl:text>
</xsl:otherwise>
</xsl:choose>
<xsl:call-template name="normalizeName">
<xsl:with-param name="name" select="$rest" />
<xsl:with-param name="isFirst" select="false()" />
</xsl:call-template>
</xsl:if>
</xsl:template>
However, there is shorter way of doing this if you're prepared for some hackery. First declare some variables:
<xsl:variable name="underscores"
select="'_______________________________________________________'" />
<xsl:variable name="initialNameChars"
select="'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:_'" />
<xsl:variable name="nameChars"
select="concat($initialNameChars, '0123456789.-')" />
Now the technique is to take the name and identify the characters that aren't legal by replacing all the characters in the name that are legal with nothing. You can do this with the translate()
function. Once you've got the set of illegal characters that appear in the string, you can replace them with underscores using the translate()
function again. Here's the template:
<xsl:template name="normalizeName">
<xsl:param name="name" />
<xsl:variable name="first" select="substring($name, 1, 1)" />
<xsl:variable name="rest" select="substring($name, 2)" />
<xsl:variable name="illegalFirst"
select="translate($first, $initialNameChars, '')" />
<xsl:variable name="illegalRest"
select="translate($rest, $nameChars, '')" />
<xsl:value-of select="concat(translate($first, $illegalFirst, $underscores),
translate($rest, $illegalRest, $underscores))" />
</xsl:template>
The only thing you have to watch out for is that the string of underscores needs to be long enough to cover all the illegal characters that might appear within a single name. Making it the same length as the longest name you're likely to encounter will do the trick (though probably you could get away with it being a lot shorter).
UPDATE:
I wanted to add to this answer. In order to generate required length underscore string you can use this template.
<!--Generate string with given number of replacement-->
<xsl:template name="gen-replacement">
<xsl:param name="n"/>
<xsl:if test="$n > 0">
<xsl:call-template name="gen-replacement">
<xsl:with-param name="n" select="$n - 1"/>
</xsl:call-template>
<xsl:text>_</xsl:text>
</xsl:if>
</xsl:template>
And call it when you need to generate underscores:
<xsl:variable name="replacement"><xsl:call-template name="gen-replacement"><xsl:with-param name="n" select="string-length($value)"/></xsl:call-template></xsl:variable>