Search code examples
xsltrtf

XSLT: Convert specific characters of a string individually to a string *including* their values in hex


I need some help, converting some 8-bit ASCII characters to a string containg their hexadezimal value. I want to convert german umlauts (äöüÄÖÜß) to their hexadecimal RTF representation. For example the character ä shall be converted to \'E4.

I know other solutions for conversions of characters, like xslt: converting characters to their hexadecimal Unicode representation. But when I tried to use this in combination with xsl:replace(), only the $ character is converted, not the result of the matching group $0.

So here is, what I have tried. Somewhere in the stylesheet I use this, to convert some chars of the string:

    <xsl:value-of select="replace($rtfText, '[äöüßÄÖÜ]', at:char-to-unicode('$0'))"/>

at:int-to-hex is the functions, from the other question. I thought it would be a good idea to use it in another function:

   <xsl:function name="at:char-to-unicode" as="xs:string">
        <xsl:param name="in" as="xs:string"/>
        <xsl:sequence select="concat('\\''', at:int-to-hex(string-to-codepoints('$in')[1]))"/>
    </xsl:function>

    <xsl:function name="at:int-to-hex" as="xs:string">
        <xsl:param name="in" as="xs:integer"/>
        <xsl:sequence
            select="if ($in eq 0)
            then '0'
            else
            concat(if ($in gt 16)
            then at:int-to-hex($in idiv 16)
            else '',
            substring('0123456789ABCDEF',
            ($in mod 16) + 1, 1))"/>
    </xsl:function>

Can anybody help?


Solution

  • As you say you use XSLT 2 or 3 and want to replace the characters in the complete output document I think using a character map is the easiest approach:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:output method="text" use-character-maps="rtf-hex"/>
    
      <xsl:character-map name="rtf-hex">
           <xsl:output-character character="ä" string="\'E4"/>
           <xsl:output-character character="ö" string="\'F6"/>
           <xsl:output-character character="ü" string="\'FC"/>
           <xsl:output-character character="Ä" string="\'C4"/>
           <xsl:output-character character="Ö" string="\'D6"/>
           <xsl:output-character character="Ü" string="\'DC"/>
           <xsl:output-character character="ß" string="\'DF"/>
      </xsl:character-map>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/pPzifpr/1 has an example.

    In XSLT 3 you can also use character maps locally on a string thanks to the serialize functions and its second parameter where you can define the character map as an XPath 3.1 map(xs:string, xs:string) e.g.

    serialize(., map { "method" : "text", "use-character-maps" : map{"Ä":"\C4","ä":"\E4","Ö":"\D6","ö":"\F6","Ü":"\DC","ü":"\FC","ß":"\DF"} })
    

    to have the mapping applied so

    <text xml:lang="de">Dies ist ein Test mit Umlauten: ä, ö, ü, ß, Ä, Ö, Ü.</text>
    

    would be transformed by

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:output method="xml"/>
    
      <xsl:template match="text">
          <xsl:copy>
              <xsl:value-of select='serialize(., map { "method" : "text", "use-character-maps" : map{"Ä":"\C4","ä":"\E4","Ö":"\D6","ö":"\F6","Ü":"\DC","ü":"\FC","ß":"\DF"} })'/>
          </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    to

    <text>Dies ist ein Test mit Umlauten: \E4, \F6, \FC, \DF, \C4, \D6, \DC.</text>
    

    I realize that last example doesn't have the exact replacement you described but as I tried to generate the used map dynamically and run into a problem with Saxon to generate the right syntax to use the map inside of an XSLT attribute you will need to fix values like map{"Ä":"\C4" to map{"Ä":"\&apos;C4".

    As for matching based on a regular expression and replacing them, in XSLT 3.0 using the analyze-string function you can use

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:fn="http://www.w3.org/2005/xpath-functions"
        xmlns:mf="http://example.com/mf"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:function name="mf:int-to-hex" as="xs:string">
          <xsl:param name="int" as="xs:integer"/>
          <xsl:sequence
             select="if ($int eq 0) 
                     then '0' 
                     else concat(
                              if ($int gt 16)
                              then mf:int-to-hex($int idiv 16) else '',
                              substring('0123456789ABCDEF', ($int mod 16) + 1, 1)
                          )"/>
      </xsl:function>
    
      <xsl:template match="text()">
          <xsl:value-of select="analyze-string(., '\p{IsLatin-1 Supplement}')/*/(if (. instance of element(fn:match)) then '\''' || mf:int-to-hex(string-to-codepoints(.)) else string())" separator=""/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/94rmq6f