Search code examples
xslttransformationodt

Which are the safest ways to accented characters in xsl transformations?


My boss told me to be sure about that, cause there is a small possibility that the character can get a strange output, something like 'solución' output 'soluci&3n'.

There are some words that need to be accented, that's the original code in xsl, it's so simple:

<table:table-cell table:style-name="TablaIkus.FXI" office:value-type="string">
                    <text:p text:style-name="PIntBodyLeft">Fecha inicio cómputo intereses</text:p>
                </table:table-cell>

The thing is, that the output shows the word correctly on the final .odt file:

enter image description here

But, just in case... There is a function to escape accents to avoid a strange output?


Solution

  • But, just in case... There is a function to escape accents to avoid a strange output?

    XML takes substantially all of Unicode as its character set. Accented characters do not require special handling, neither in XML generally nor in XSLT in particular. Therefore, no, there is no function to escape accents or accented characters, and none is needed.

    Your question belies a misunderstanding, however. As I wrote in comments, XML affords multiple, semantically equivalent ways to represent the same character. This applies to your XML input documents, your stylesheet documents, and your result documents for output method "xml". For example, if the document encoding supports it then the character ó ("LATIN SMALL LETTER O WITH ACUTE", as Unicode names it) can be conveyed directly via its representation in the document's character encoding, but it can also, equivalently, be represented as an XML character entity in either of two forms: &#xf3; or &#243;.

    An XSLT processor does not commit any error if it outputs XML containing different representations of some characters than those used in the input. Under some circumstances, it may actually need to do so. If it indeed performs such a conversion, then it does not thereby alter the meaning of the document in any way. It sounds like you want to avoid such conversions, but that's just not a problem you should worry about.

    Do, however, ensure that your input and stylesheet documents accurately declare their character encoding in their XML declarations. For example,

    <?xml version="1.1" encoding="UTF-8"?>
    

    If your documents do not bear an XML declaration or if it does not declare an encoding then be certain that they are encoded using XML's default encoding, UTF-8. Misrepresenting the encoding to your XML tools is indeed a way that characters in your documents could be scrambled.