Search code examples
xmlxslthtml-entitiesxalan

Leave entity intact in XML + XSLT


I transform XML to (sort of) HTML with XSL stylesheets (using Apache Xalan). In XML there can be entities like —, which must be left as is. In beginning of XML file I have a doctype which references these entities. What should I do for entity to be left unchanged?

<!DOCTYPE article [
<!ENTITY mdash "&mdash;"><!-- em dash -->
]>

gives me SAXParseException: Recursive entity expansion, 'mdash' when encountering &mdash in XML text.


Solution

  • The way to define and use the entity is:

    <!DOCTYPE xsl:stylesheet [<!ENTITY mdash "&#x2014;">]>
    <t>Hello &mdash; World!</t>
    

    When processed with the simplest possible XSLT stylesheet:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="text"/>
    </xsl:stylesheet>
    

    The correct output (containing mdash) is produced:

    Hello — World!

    Important:

    In XSLT 2.0 it is possible to use the <xsl:character-map> instruction so that certain, specified characters are represented by entities. In this particular case:

    <xsl:stylesheet   version="2.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
      <xsl:output omit-xml-declaration="yes" 
        use-character-maps="mdash"/>
      <xsl:character-map name="mdash">
        <xsl:output-character character="&#x2014;" string="&amp;mdash;" />
      </xsl:character-map>
    
    </xsl:stylesheet>
    

    when the above transformation is applied on the same XML document (already shown above), the output is:

    Hello &mdash; World!