Search code examples
oracle-databaseoracle12cxmltype

How to find list of all elements&attributes in a xml file from a Oracle table clob column?


I am trying to find all tags(including its XPATH) in an XML file which is placed in a CLOB Column in an oracle table. Could you please provide a query for this.


Solution

  • A list of unique tags can be generated with an XSL style sheet.

    First create a simple table with an XML stored as a CLOB, using a random sample XML:

    create table sample_xml(id number, some_xml clob);
    
    insert into sample_xml values(1, q'[<?xml version="1.0"?>
    <catalog>
       <book id="bk101">
          <author>Gambardella, Matthew</author>
          <title>XML Developer's Guide</title>
       </book>
       <book id="bk102">
          <author>Ralls, Kim</author>
          <title>Midnight Rain</title>
       </book>
    </catalog>]');
    

    Now use XMLTRANSFORM to convert the XML to a distinct set of tags, using the XML style sheet from this question:

    select xmltransform(
        xmltype(some_xml), 
        xmltype('<?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="xs"
        version="2.0">
        <xsl:output method="text"/>
        <xsl:strip-space elements="*" />
        <xsl:key name="kElemByName" match="*" use="name(.)"/>
        <xsl:template match="
            *[generate-id()
            =
            generate-id(key(''kElemByName'', name(.))[1])
            ]">
            <xsl:value-of select="concat(name(.), '''||chr(38)||'#xA;'')"/>
            <xsl:apply-templates select="*"/>
        </xsl:template>
        <xsl:template match="text()"/>
    </xsl:stylesheet>')) tags
    from sample_xml;
    
    
    TAGS
    ----
    catalog
    book
    author
    title
    

    You can get the results one-line-at-a-time by using regexp_substr to split the string:

    select regexp_substr(replace(tags, chr(10), ','),'[^,]+', 1, level) from
    (
        ... insert large query here
    )
    connect by regexp_substr(replace(tags, chr(10), ','),'[^,]+', 1, level) is not null;
    

    To see the attributes as well, check out the linked question, it also includes XML style sheets for getting attributes.