I am trying to find all tags(including its XPATH) in an XML file which is placed in a CLOB Column in an oracle table. Could you please provide a query for this.
A list of unique tags can be generated with an XSL style sheet.
First create a simple table with an XML stored as a CLOB, using a random sample XML:
create table sample_xml(id number, some_xml clob);
insert into sample_xml values(1, q'[<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
</book>
</catalog>]');
Now use XMLTRANSFORM
to convert the XML to a distinct set of tags, using the XML style sheet from this question:
select xmltransform(
xmltype(some_xml),
xmltype('<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="text"/>
<xsl:strip-space elements="*" />
<xsl:key name="kElemByName" match="*" use="name(.)"/>
<xsl:template match="
*[generate-id()
=
generate-id(key(''kElemByName'', name(.))[1])
]">
<xsl:value-of select="concat(name(.), '''||chr(38)||'#xA;'')"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>')) tags
from sample_xml;
TAGS
----
catalog
book
author
title
You can get the results one-line-at-a-time by using regexp_substr
to split the string:
select regexp_substr(replace(tags, chr(10), ','),'[^,]+', 1, level) from
(
... insert large query here
)
connect by regexp_substr(replace(tags, chr(10), ','),'[^,]+', 1, level) is not null;
To see the attributes as well, check out the linked question, it also includes XML style sheets for getting attributes.