Search code examples
sqlsql-serverxmlcdata

SQL SERVER xml with CDATA


I have a table in my database with a column containing xml. The column type is nvarchar(max). The xml is formed in this way

<root>
  <child>....</child>
  .
  .
  <special>
   <event><![CDATA[text->text]]></event>
   <event><![CDATA[text->text]]></event>
  ...
  </special>
</root>

I have not created the db, I cannot change the way information is stored in it but I can retrieve it with a select. For the extraction I use select cast(replace(xml,'utf-8','utf-16')as xml) from table

It works well except for cdata, whose content in the query output is: text -> text

Is there a way to retrieve also the CDATA tags?


Solution

  • Well, this is - as far as I know - not possible on normal ways...

    The CDATA section has one sole reason: include invalid characters within XML for lazy people...

    CDATA is not seen as needed at all and therefore is not really supported by normal XML methods. Or in other words: It is supported in the way, that the content is properly escaped. There is no difference between correctly escaped content and not-escaped content within CDATA actually! (Okay, there are some minor differences like including ]]> within a CDATA-section and some more tiny specialties...)

    The big question is: Why?

    What are you trying to do with this afterwards?

    Try this. the included text is given as is:

    DECLARE @xml XML = 
    '<root>
      <special>
       <event><![CDATA[text->text]]></event>
       <event><![CDATA[text->text]]></event>
      </special>
    </root>'
    
    SELECT t.c.query('text()')
    FROM @xml.nodes('/root/special/event') t(c);
    

    So: Please explain some more details: What do you really want?

    If your really need nothing more than the wrapping CDATA you might use this:

    SELECT '<![CDATA[' + t.c.value('.','varchar(max)') + ']]>'
    FROM @xml.nodes('/root/special/event') t(c);
    

    Update: Same with outdated FROM OPENXML

    I just tried how the outdated approach with FROM OPENXML handles this and found, that there is absolutely no indication in the resultset, that the given text was within a CDATA section originally. The "Some value here" is exactly returned in the same way as the text within CDATA:

    DECLARE @doc XML = 
    '<root>
      <child>Some value here </child>
      <special>
       <event><![CDATA[text->text]]></event>
       <event><![CDATA[text->text]]></event>
      </special>
    </root>';
    
    DECLARE @hnd INT;
    
    EXEC sp_xml_preparedocument @hnd OUTPUT, @doc;  
    
    SELECT * FROM OPENXML (@hnd, '/root',0);
    
    EXEC sp_xml_removedocument @hnd;