Search code examples

entity translation to customized entity

There are some user defined entites in the xml data. In order to unescape those entities, we are using below code:-

<xsl:stylesheet version='3.0' xmlns:xsl='' >
<xsl:output method="xml" omit-xml-declaration="no" use-character-maps="mdash" />
<xsl:character-map name="mdash">
<xsl:output-character character="&#x2014;" string="&amp;mdash;"/>
<xsl:output-character character="&amp;" string="&amp;amp;" />
<xsl:output-character character="&quot;" string="&amp;quot;" />
<xsl:output-character character="&apos;" string="&amp;apos;" />
<xsl:output-character character="&#167;" string="&amp;sect;"/>
<xsl:output-character character="&#36;" string="&amp;dollar;" />
<xsl:output-character character="&#47;" string="&amp;sol;" />
<xsl:output-character character="&#45;" string="&amp;hyphen;" />
<xsl:template match="@* | node()">
<xsl:apply-templates select="@* | node()"/>

But there is a special case where &sect; is appearing twice in data, for example:-

Ex- The number &sect;&sect; 1234

The above should example should be converted to a special userdefined entity i.e.

Output- The number &multisect; 1234

The &sect;&sect; should be converted to &multisect;


  • If you want to use a character map, you would first need to process text nodes where you expect the two sect characters to be present and replace them with a single character you don't expect to be used elsewhere; that character could then be converted by the map to the string &multisect; e.g. the stylesheet

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl=""
      <xsl:param name="multisect-sub" static="yes" as="xs:string" select="'«'"/>
      <xsl:character-map name="sub">
        <xsl:output-character _character="{$multisect-sub}" string="&amp;multisect;"/>
      <xsl:mode on-no-match="shallow-copy"/>
      <xsl:output method="xml" indent="yes" use-character-maps="sub"/>
      <xsl:template match="text()">
        <xsl:apply-templates mode="analyze" select="analyze-string(., '&#xA7;&#xA7;')"/>
      <xsl:template mode="analyze" match="fn:match">

    transforms the input

    <!DOCTYPE text [
      <!ENTITY sect "&#xA7;">
    <text>&sect;&sect; 1234</text>

    into the output

    <?xml version="1.0" encoding="UTF-8"?>
    <text>&multisect; 1234</text>

    Note that I used '«' primarily as an example, you might want to need to use a private char or some other character you are sure doesn't occur in your input/output data.

    If you want the result to be well-formed you would also need to add a doctype to the output with e.g. xsl:output doctype-system="some.dtd" where you ensure that some.dtd declares e.g. <!ENTITY multisect "&#xA7;&#xA7;">