Search code examples
xmloracle-databasexmltype

Extra whitespace in XML file read from Oracle database - why?


I am experimenting with Python and Oracle XML DB. I have a table with an XMLType column and an ID column in an Oracle 11g database. The storage model for the XML column is object relational. Sometimes I need to get a whole XML file, and often it is longer than 4000 characters, so I use this query to get a CLOB:

select t.representation.getclobval()
from myxmldocs t 
where id=:documentId

When I run this query the output includes extra whitespace, with newlines and tabs between XML elements that were definitely not there in the XML docs I inserted. The effect is of some kind of formatting, so that the output looks like this:

<A>\n
\t<B></B>\n
\t\t<C>Some text</C>\n
\t\t<C>Some more text</C>\n
\t<B></B>\n
...

and so on. Quite pretty and readable, but why am I getting it? It also messes other libraries that I am using that choke on the extra whitespaces.

If I remove getclobval() my Python client does not get a CLOB but an Object and I don't know what to do with it.

This appears consistent; I get this problem using the sqlplus command line client, and also creating other tables using different XML Schemas, and then querying them. In a previous version of my prototype I had the XMLType column use a CLOB storage model and didn't have this problem.

How should I rewrite the query to just get a CLOB with the XML file without the extra formatting?


Update: as requested in the comments, this is the output I get running the query select dump(t.representation) from myxmldocs t where id=:documentId from the command line client (replacing of course :documentId with an actual, existing ID from the database):

DUMP(T.REPRESENTATION)
--------------------------------------------------------------------------------
Typ=58 Len=218: 32,156,148,1,0,0,0,0,80,193,223,20,0,0,0,0,216,15,47,21,0,0,0,0,
80,44,55,21,0,0,0,0,0,202,154,59,160,15,0,0,160,15,0,0,1,0,4,0,220,190,195,71,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,174,33,65,0,15,0,72,0,1,0,0,0,0,0,0,0,49,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0

Solution

  • in 11g you can use xmlserialize (in fact you should not use getclobval anymore. it's not recommended for performance reasons)

     SQL> select t.test.getclobval() from testxml t where id = 1;
    
    T.TEST.GETCLOBVAL()
    --------------------------------------------------------------------------------
    <A>
      <B>
        <C>foo</C>
        <C>foo2</C>
      </B>
    </A>
    
    
    SQL> select xmlserialize(document t.test as clob no indent) from testxml t where id = 1;
    
    XMLSERIALIZE(DOCUMENTT.TESTASCLOBNOINDENT)
    --------------------------------------------------------------------------------
    <A><B><C>foo</C><C>foo2</C></B></A>