Search code examples
xsltspecial-characters

How to fix a special character in XSLT


I am dealing with below XML where I need to remove a special character in firstname. é in (Andrés) not sure what is this character is actually called. If I process firstname as is it's failing in the Vendor system

<?xml version="1.0" encoding="UTF-8"?>
<reportentry>
<reportdata>
    <id>12345</id>
    <firstname>Andrés</firstname>
    <lastname>Williams</lastname>
</reportdata>
</reportentry>

I simply tried replace function which is working, below is the code. Not sure is there any better way to deal with it ? any suggestions ?

 <xsl:value-of select="replace($string1, 'é', 'e')"/>

Full code

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">

<xsl:variable name="string1" select="/reportentry/reportdata/firstname"/>
<xsl:variable name="comma" select="','"/>
<xsl:output method="text" omit-xml-declaration="yes"/>

<xsl:template match="/reportentry">

    <xsl:value-of select="reportdata/id"/>
    <xsl:value-of select="$comma"/>
    <xsl:value-of select="replace($string1, 'é', 'e')"/>
    <xsl:value-of select="$comma"/>
    <xsl:value-of select="reportdata/lastname"/>

</xsl:template>
</xsl:stylesheet>

I expected result as 12345,Andres,Williams


Solution

  • You can strip most diacritics by using normalize-unicode() to convert the string to decomposed normal form (NFD), and then using replace() to remove all "non-spacing mark" characters (category Mn).

    So replace(normalize-unicode(xxx, 'NFD'), '\p{Mn}', '')

    Not tested.

    But it would be better to modernise the receiving application so it can handle international names...