I am dealing with below XML where I need to remove a special character in firstname. é in (Andrés) not sure what is this character is actually called. If I process firstname as is it's failing in the Vendor system
<?xml version="1.0" encoding="UTF-8"?>
<reportentry>
<reportdata>
<id>12345</id>
<firstname>Andrés</firstname>
<lastname>Williams</lastname>
</reportdata>
</reportentry>
I simply tried replace function which is working, below is the code. Not sure is there any better way to deal with it ? any suggestions ?
<xsl:value-of select="replace($string1, 'é', 'e')"/>
Full code
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
<xsl:variable name="string1" select="/reportentry/reportdata/firstname"/>
<xsl:variable name="comma" select="','"/>
<xsl:output method="text" omit-xml-declaration="yes"/>
<xsl:template match="/reportentry">
<xsl:value-of select="reportdata/id"/>
<xsl:value-of select="$comma"/>
<xsl:value-of select="replace($string1, 'é', 'e')"/>
<xsl:value-of select="$comma"/>
<xsl:value-of select="reportdata/lastname"/>
</xsl:template>
</xsl:stylesheet>
I expected result as 12345,Andres,Williams
You can strip most diacritics by using normalize-unicode()
to convert the string to decomposed normal form (NFD), and then using replace()
to remove all "non-spacing mark" characters (category Mn
).
So replace(normalize-unicode(xxx, 'NFD'), '\p{Mn}', '')
Not tested.
But it would be better to modernise the receiving application so it can handle international names...