Search code examples
sqlxmloracleparsingaggregation

Aggregate values of XML tags via SQL query


I need to aggregate some sub-tag values into a Oracle SQL column.

Let's say the XML structure is something like this:

<Product>
[..other tags..]
 <Attributes>
   <Statements Id="1" Name="Statement 0">
     <Statement Id="4">Subtype 1</Statement>
   </Statements>
   <Statements Id="3" Name="Statement 1">
     <Statement Id="4">Subtype 4</Statement>
     <Statement Id="5">Subtype 5</Statement>
     <Statement Id="15">Subtype 15</Statement>
   </Statements>
   <Statements Id="16" Name="Statement 2">
     <Statement Id="4">Subtype 4</Statement>
     <Statement Id="5">Subtype 5</Statement>
     <Statement Id="10">Subtype 10</Statement>
   </Statements>
 </Attributes>
</Product>

My Product table in SQL should be populated with something like this: Statement_2 with entry Subtype4,Subtype5,Subtype10... etc..

I actually tried to use some query like the following:

SELECT 
    b.id,
    LISTAGG(x2.statement_1, ',') WITHIN GROUP (ORDER BY NULL) AS statement 1,
    LISTAGG(y2.statement_2, ',') WITHIN GROUP (ORDER BY NULL) AS statement_2
FROM 
    TABLE_WHERE_XML_RESIDES b

CROSS JOIN 
    XMLTABLE('/root_path/Statements[@Id="3"]' 
             PASSING XMLTYPE(b.XML_DOCUMENT)
             COLUMNS statement1_list XMLTYPE PATH 'Statement') x
CROSS JOIN 
    XMLTABLE('/Statement' 
             PASSING x.statement1_list 
             COLUMNS statement_name_1 VARCHAR2(100) PATH 'text()') x2

CROSS JOIN 
   XMLTABLE('/root_path/Statements[@Id="16"]' 
             PASSING XMLTYPE(b.XML_DOCUMENT)
             COLUMNS statement1_list XMLTYPE PATH 'Statement') y

CROSS JOIN 
    XMLTABLE('/Statement' 
             PASSING x.statement2_list 
             COLUMNS statement_name_1 VARCHAR2(4000) PATH 'text()') y2
GROUP BY 
    b.id;

In this way the aggregation kind of of works, but for many <Statements> the query will be not performant I guess. And main problem is, due to the join mechanisms subsequent statements values (i.e. statements 2) would be duplicated for N times (which correspond to the N entries already retrieved for the column in statement1.


Solution

  • You can use or operator in the XPath expression and use conditional aggregation to aggregate each Id within a single parse:

    SELECT b.id,
           x.*
    FROM   TABLE_WHERE_XML_RESIDES b
           CROSS JOIN LATERAL (
             SELECT LISTAGG(CASE id WHEN  3 THEN value END, ', ') AS statement3,
                    LISTAGG(CASE id WHEN 16 THEN value END, ', ') AS statement16
             FROM   XMLTABLE(
                      '/Product/Attributes/Statements[@Id="3" or @Id=16]/Statement' 
                      PASSING XMLTYPE(b.XML_DOCUMENT)
                      COLUMNS 
                        id    NUMBER       PATH './../@Id',
                        value VARCHAR2(20) PATH './text()'
                      ) x
           ) x;
    

    Which, for the sample data:

    CREATE TABLE TABLE_WHERE_XML_RESIDES (id, xml_document) AS
    SELECT 1, EMPTY_CLOB() || '<Product>
     <Attributes>
       <Statements Id="1" Name="Statement 0">
         <Statement Id="4">Subtype 1</Statement>
       </Statements>
       <Statements Id="3" Name="Statement 1">
         <Statement Id="4">Subtype 4</Statement>
         <Statement Id="5">Subtype 5</Statement>
         <Statement Id="15">Subtype 15</Statement>
       </Statements>
       <Statements Id="16" Name="Statement 2">
         <Statement Id="4">Subtype 4</Statement>
         <Statement Id="5">Subtype 5</Statement>
         <Statement Id="10">Subtype 10</Statement>
       </Statements>
     </Attributes>
    </Product>' FROM DUAL;
    

    Outputs:

    ID STATEMENT3 STATEMENT16
    1 Subtype 4, Subtype 5, Subtype 15 Subtype 4, Subtype 5, Subtype 10

    fiddle