Search code examples
sqlxmloracle-databasexpathxmltype

How to extract element-path from XMLType Node?


I would like to have a select statement on an XML document and one column should return me the path of each node.

For example, given the data

SELECT * 
FROM TABLE(XMLSequence(
  XMLTYPE('<?xml version="1.0"?>
    <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
    </users>').extract('/*//*[text()]'))) t;

Which results in

column_value
--------
<user><name>user1</name></user>
<user><name>user2</name></user>
<user><name>user3</name></user>
<user><name>user4</name></user>

I'd like to have a result like this:

path                     value
------------------------ --------------
/users/user/name         user1
/users/user/name         user2
/users/group/user/name   user3
/users/user/name         user4

I can not see how to get to this. I figure there are two thing that have to work together properly:

  • Can I extract the path from an XMLType with a single operation or method, or do I have to do this with string-magic?
  • What is the correct XPath expression so that I do get the whole element path (if thats possible), eg. <users><group><user><name>user3</name></user></group></user> insead of <user><name>user3</name></user>?

Maybe I am not understanding XMLType fully, yet. It could be I need a different approach, but I can not see it.

Sidenotes:

  • In the final version the XML document will be coming from CLOBs of a table, not a static document.
  • The path column can of course also use dots or whatever and the initial slash is not the issue, any representation would do.
  • Also I would not mind if every inner node also gets a result row (possibly with null as value), not only the ones with text() in it (which is what I am really interested in).
  • In the end I will need the tail element of path separate (always "name" in the example here, but this will vary later), i.e. ('/users/groups/user', 'name', 'user3'), I can deal with that separately.

Solution

  • You can achieve that with help of XMLTable function from Oracle XML DB XQuery function set:

    select * from 
      XMLTable(
        '
         declare function local:path-to-node( $nodes as node()* )  as xs:string* {
           $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
         };
         for $i in $rdoc//name 
           return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>
        '
        passing 
        XMLParse(content '
          <users><user><name>user1</name></user>
               <user><name>user2</name></user>
               <group>
                  <user><name>user3</name></user>
               </group>
               <user><name>user4</name></user>
          </users>'
        )
        as "rdoc"
        columns 
          name_path  varchar2(4000) path '//ret/name_path',
          name_value varchar2(4000) path '//ret/name'
    
      )
    

    For me XQuery looks at least more intuitive for XML data manipulation than XSLT.

    You can find useful set of XQuery functions here.

    Update 1

    I suppose that you need totally plain dataset with full data at last stage. This target can be reached by complicated way, constructed step-by-step below, but this variant is very resource-angry. I propose to review final target (selecting some specific records, count number of elements etc.) and after that simplify this solution or totally change it.

    Update 2

    All steps deleted from this Update except last because @A.B.Cade proposed more elegant solution in comments. This solution provided in Update 3 section below.

    Step 1 - Constructing dataset of id's with corresponding query results

    Step 2 - Aggregating to single XML row

    Step 3 - Finally get full plain dataset by querying constracted XML with XMLTable

    with xmlsource as (
      -- only for purpose to write long string only once
      select '
          <users><user><name>user1</name></user>
               <user><name>user2</name></user>
               <group>
                  <user><name>user3</name></user>
               </group>
               <user><name>user4</name></user>
          </users>' xml_string
       from dual   
    ),
    xml_table as ( 
      -- model of xmltable
      select 10 id, xml_string xml_data from xmlsource union all 
      select 20 id, xml_string xml_data from xmlsource union all 
      select 30 id, xml_string xml_data from xmlsource 
    ) 
    select  *
    from
      XMLTable(
        '
            for $entry_user in $full_doc/full_list/list_entry/name_info
              return <tuple>
                       <id>{data($entry_user/../@id_value)}</id>
                       <path>{$entry_user/name_path/text()}</path>
                       <name>{$entry_user/name_value/text()}</name>
                      </tuple> 
        '
        passing ( 
          select  
            XMLElement("full_list", 
              XMLAgg(     
                XMLElement("list_entry",
                  XMLAttributes(id as "id_value"),
                  XMLQuery(
                    '
                     declare function local:path-to-node( $nodes as node()* )  as xs:string* {
                       $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
                     };(: function to construct path :) 
                     for $i in $rdoc//name return <name_info><name_path>{local:path-to-node($i)}</name_path><name_value>{$i/text()}</name_value></name_info>
                    '
                    passing by value XMLParse(content xml_data) as "rdoc"
                    returning content
                  )
                )
              )
            )        
            from xml_table
        )   
        as "full_doc"      
        columns
          id_val   varchar2(4000) path '//tuple/id',
          path_val varchar2(4000) path '//tuple/path',
          name_val varchar2(4000) path '//tuple/name'
      )    
    

    Update 3

    As mentioned by @A.B.Cade in his comment, there are really simple way to join ID's with XQuery results.

    Because I don't like external links in answers, code below represents his SQL fiddle, a little bit adapted to the data source from this answer:

    with xmlsource as (
      -- only for purpose to write long string only once
      select '
          <users><user><name>user1</name></user>
               <user><name>user2</name></user>
               <group>
                  <user><name>user3</name></user>
               </group>
               <user><name>user4</name></user>
          </users>' xml_string
       from dual   
    ),
    xml_table as ( 
      -- model of xmltable
      select 10 id, xml_string xml_data from xmlsource union all 
      select 20 id, xml_string xml_data from xmlsource union all
      select 30 id, xml_string xml_data from xmlsource
    )
    select xd.id, x.*  from
    xml_table xd,
      XMLTable(
        'declare function local:path-to-node( $nodes as node()* )  as xs:string* {$nodes/string-join(ancestor-or-self::*/name(.), ''/'')     };     for $i in $rdoc//name        return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>    '
        passing
        XMLParse(content xd.xml_data
        )
        as "rdoc"
        columns
          name_path  varchar2(4000) path '//ret/name_path',
          name_value varchar2(4000) path '//ret/name'
    
      ) x