Search code examples
xmloracle-databaseplsqlclob

In Oracle, how to extract values from XML/CLOB field that has multiple values?


I am trying to use the Oracle function EXTRACTVALUE() along with xmltype() to extract field values from a CLOB data type. My problem is that because there are multiple values in some rows, and I'm not sure how to make EXTRACTVALUE() work on these. In the example below, there are two cases -- the syntax works correctly on the first row, but not on the second row (I get a parsing failure message because there are multiple fields/values). I would like to just take the most recent (last) set of values in cases where multiple values exist.

create table TEST1 
(
  id1         numeric(2),
  col1        CLOB not null  
);

-- Insert data:
INSERT INTO TEST1 (id1, col1) values (1, '<raceHistory><raceDate>1980-05-26</raceDate><raceType>CLASSIC</raceType></raceHistory>');
INSERT INTO TEST1 (id1, col1) values (2, '<raceHistory><raceDate>1997-06-21</raceDate><raceType>MARATHON</raceType></raceHistory><raceHistory><raceDate>2017-01-01</raceDate><raceType>SKATE</raceType></raceHistory>');

-- Make sure it populated correctly.
select * from TEST1;

-- Now try to extract the field values for raceDate and raceType:
-- The parsing works fine for the first row...
SELECT 
    EXTRACTVALUE(xmltype(col1), '/raceHistory/raceDate') as raceDate,
    EXTRACTVALUE(xmltype(col1), '/raceHistory/raceType') as raceType
FROM TEST1
WHERE id1 = 1;

-- ... but the parsing fails on this one because there are multiple values:
SELECT 
    EXTRACTVALUE(xmltype(col1), '/raceHistory/raceDate') as raceDate,
    EXTRACTVALUE(xmltype(col1), '/raceHistory/raceType') as raceType
FROM TEST1
WHERE id1 = 2;

My desired output would be this – for the second row, I just want the last value (i.e., the most recent rate schedule variable).

id1 raceDate    raceType
1   1985-05-25  CLASSIC
2   2017-01-01  SKATE

Solution

  • Well, one problem is that your strings aren't valid XML - they need a single root node enclosing the rest of the string. It doesn't matter what name you give the root node. I'm not sure what Oracle version you're on, but in 12c xmltype() will fail with LPX-00245: extra data after end of document if there's no root node.

    That said, all you need is to change your XPath to e.g. /raceHistory[last()]/raceDate to get the last raceHistory entry.

    Here's an example where I added a root node to your xml:

    with test1 as (select 2 as id1, '<root><raceHistory><raceDate>1997-06-21</raceDate><raceType>MARATHON</raceType></raceHistory><raceHistory><raceDate>2017-01-01</raceDate><raceType>SKATE</raceType></raceHistory></root>' as col1 from dual)
    SELECT 
        EXTRACTVALUE(xmltype(col1), '/root/raceHistory[last()]/raceDate') as raceDate,
        EXTRACTVALUE(xmltype(col1), '/root/raceHistory[last()]/raceType') as raceType
    FROM TEST1
    WHERE id1 = 2;
    

    Note that this doesn't get the raceHistory entry with the latest raceDate - that would be more complicated. It just gets the last raceHistory entry in the string. In your case they're already sorted that way; I just wanted to clarify that.