Search code examples
xmloracle-databaseplsql

Prevent Oracle Dbms_XmlDom from escaping attributes


Another question about Oracle's Dbms_XmlDom functionality. This time about setting attributes.

Here's the code:

DECLARE
   doc_   dbms_XmlDom.DomDocument := Dbms_XmlDom.newDomDocument;
   nd_    dbms_XmlDom.DomNode;
   tag_   dbms_XmlDom.DomElement;
   data_in_ VARCHAR2(50) := '£';
   out_   VARCHAR2(32000);
BEGIN
   Dbms_XmlDom.setVersion (doc_, '1.0" encoding="UTF-8" standalone="yes');
   tag_ := Dbms_XmlDom.createElement (
      doc     => doc_,
      tagName => 'coreProperties'
   );
   Dbms_XmlDom.setAttribute(tag_, 'myAttribute', data_in_);
   nd_  := Dbms_XmlDom.appendChild (Dbms_XmlDom.makeNode(doc_), Dbms_XmlDom.makeNode(tag_));

   out_ := Dbms_XmlDom.getXmlType(doc_).getClobVal;
   Dbms_XmlDom.freeDocument (doc_);
   Dbms_Output.Put_Line (out_);
END;

And here's the output:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<coreProperties myAttribute="&amp;#163;"/>

I need the output to be:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<coreProperties myAttribute="&#163;"/>

In other words, I want to prevent the & in the input data (which gets fed into the attribute's value) from escaping into &amp; by the function setAttribute(). Can it be done?

I should add that if this were data being fed into an element, I could CDATA it. But this can't be done in attributes, I believe.


Solution

  • The direct answer to this question seems to be a "no". It's not possible to inform Dbms_XmlDom that some & characters are themselves part of an escaped character-sequence and therefore should not themselves be escaped.

    However, my original question was a little unfair in that it didn't give a full context of why this was necessary. I didn't think any further context was necessary at the time.

    The full story is that my overall PL/SQL program is building an Excel file and to encapsulate the functionality that I need, I need to encode the cell's format mask like the following (rendering the cell a in UK currency format):

    _-£* #,##0.00_-;\£* #,##0.00_-
    

    Unfortunately, when you put that exact format into an XML tag like this

    <numFmt numFmtId="48" formatCode="_-£* #,##0.00_-;\£* #,##0.00_-"/>
    

    Excel complains of corruption because for whatever reason, Excel doesn't accept £ as an attribute to an XML node.

    A work around that Excel does accept an escaped version of £ into an XML attribute, like this:

    <numFmt numFmtId="48" formatCode="_-&#163;* #,##0.00_-;\&#163;* #,##0.00_-"/>
    

    Essentially therefore, we load £ into an XML file with its HTML escape-sequence. The problem I had was that this XML is being generated by code which itself escapes the & character in &#163;, causing my output XML to look like:

    <numFmt numFmtId="48" formatCode="_-&amp;#163;* #,##0.00_-;\&amp;#163;* #,##0.00_-"/>
    

    So I have a case of a doubly escaped escape-sequence (which again causes an Excel file corruption).

    The solution I found (after taking a step back) is that Excel also accepts different ways of encoding £. Essentially, it accepts £ if it is enclosed in an escaped instance of quote " (such as "£"). So, this XML is accepted by Excel.

    <numFmt numFmtId="48" formatCode="_-&quot;£&quot* #,##0.00_-;\&quot;£&quot;* #,##0.00_-"/>
    

    And this can be encoded in Dmbs_XmlDom in this way:

    DECLARE
       doc_   dbms_XmlDom.DomDocument := Dbms_XmlDom.newDomDocument;
       nd_    dbms_XmlDom.DomNode;
       tag_   dbms_XmlDom.DomElement;
       data_in_ VARCHAR2(50) := '"£"'; -- plus the rest of the currency mask, of course
       out_   VARCHAR2(32000);
    BEGIN
       Dbms_XmlDom.setVersion (doc_, '1.0" encoding="UTF-8" standalone="yes');
       tag_ := Dbms_XmlDom.createElement (doc_, 'coreProperties');
       Dbms_XmlDom.setAttribute(tag_, 'myAttribute', data_in_);
       nd_  := Dbms_XmlDom.appendChild (Dbms_XmlDom.makeNode(doc_), Dbms_XmlDom.makeNode(tag_));
    
       out_ := Dbms_XmlDom.getXmlType(doc_).getClobVal;
       Dbms_XmlDom.freeDocument (doc_);
       Dbms_Output.Put_Line (out_);
    END;
    

    Thanks to @MTO for his/her help. Their answer forced me to rethink the escape sequences used in XML and look at other alternatives that Excel might accept.