Another question about Oracle's Dbms_XmlDom
functionality. This time about setting attributes.
Here's the code:
DECLARE
doc_ dbms_XmlDom.DomDocument := Dbms_XmlDom.newDomDocument;
nd_ dbms_XmlDom.DomNode;
tag_ dbms_XmlDom.DomElement;
data_in_ VARCHAR2(50) := '£';
out_ VARCHAR2(32000);
BEGIN
Dbms_XmlDom.setVersion (doc_, '1.0" encoding="UTF-8" standalone="yes');
tag_ := Dbms_XmlDom.createElement (
doc => doc_,
tagName => 'coreProperties'
);
Dbms_XmlDom.setAttribute(tag_, 'myAttribute', data_in_);
nd_ := Dbms_XmlDom.appendChild (Dbms_XmlDom.makeNode(doc_), Dbms_XmlDom.makeNode(tag_));
out_ := Dbms_XmlDom.getXmlType(doc_).getClobVal;
Dbms_XmlDom.freeDocument (doc_);
Dbms_Output.Put_Line (out_);
END;
And here's the output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<coreProperties myAttribute="&#163;"/>
I need the output to be:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<coreProperties myAttribute="£"/>
In other words, I want to prevent the &
in the input data (which gets fed into the attribute's value) from escaping into &
by the function setAttribute()
. Can it be done?
I should add that if this were data being fed into an element, I could CDATA it. But this can't be done in attributes, I believe.
The direct answer to this question seems to be a "no". It's not possible to inform Dbms_XmlDom
that some &
characters are themselves part of an escaped character-sequence and therefore should not themselves be escaped.
However, my original question was a little unfair in that it didn't give a full context of why this was necessary. I didn't think any further context was necessary at the time.
The full story is that my overall PL/SQL program is building an Excel file and to encapsulate the functionality that I need, I need to encode the cell's format mask like the following (rendering the cell a in UK currency format):
_-£* #,##0.00_-;\£* #,##0.00_-
Unfortunately, when you put that exact format into an XML tag like this
<numFmt numFmtId="48" formatCode="_-£* #,##0.00_-;\£* #,##0.00_-"/>
Excel complains of corruption because for whatever reason, Excel doesn't accept £
as an attribute to an XML node.
A work around that Excel does accept an escaped version of £
into an XML attribute, like this:
<numFmt numFmtId="48" formatCode="_-£* #,##0.00_-;\£* #,##0.00_-"/>
Essentially therefore, we load £
into an XML file with its HTML escape-sequence. The problem I had was that this XML is being generated by code which itself escapes the &
character in £
, causing my output XML to look like:
<numFmt numFmtId="48" formatCode="_-&#163;* #,##0.00_-;\&#163;* #,##0.00_-"/>
So I have a case of a doubly escaped escape-sequence (which again causes an Excel file corruption).
The solution I found (after taking a step back) is that Excel also accepts different ways of encoding £
. Essentially, it accepts £
if it is enclosed in an escaped instance of quote "
(such as "£"
). So, this XML is accepted by Excel.
<numFmt numFmtId="48" formatCode="_-"£"* #,##0.00_-;\"£"* #,##0.00_-"/>
And this can be encoded in Dmbs_XmlDom
in this way:
DECLARE
doc_ dbms_XmlDom.DomDocument := Dbms_XmlDom.newDomDocument;
nd_ dbms_XmlDom.DomNode;
tag_ dbms_XmlDom.DomElement;
data_in_ VARCHAR2(50) := '"£"'; -- plus the rest of the currency mask, of course
out_ VARCHAR2(32000);
BEGIN
Dbms_XmlDom.setVersion (doc_, '1.0" encoding="UTF-8" standalone="yes');
tag_ := Dbms_XmlDom.createElement (doc_, 'coreProperties');
Dbms_XmlDom.setAttribute(tag_, 'myAttribute', data_in_);
nd_ := Dbms_XmlDom.appendChild (Dbms_XmlDom.makeNode(doc_), Dbms_XmlDom.makeNode(tag_));
out_ := Dbms_XmlDom.getXmlType(doc_).getClobVal;
Dbms_XmlDom.freeDocument (doc_);
Dbms_Output.Put_Line (out_);
END;
Thanks to @MTO for his/her help. Their answer forced me to rethink the escape sequences used in XML and look at other alternatives that Excel might accept.