Search code examples
regexoracle-databasesubstroracle12cregexp-substr

How to split a CLOB object using , and : delimiter in Oracle into multiple records


I have a CLOB object sample as shown below. I want to first split this by using delimiter "," and save it in a temporary table for later use.

ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0

I want to save the result in the below format in each row.

Column_Name
__________________________
ABCDEF:PmId12345RmLn1VlId0
ABCDEF:PmId12345RmLn1VlId0
ABCDEF:PmId12345RmLn1VlId0

I tried using REGEXP_SUBSTR function

select 
    regexp_substr('ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0', '[^,]+', 1, 1) Column_Name 
from dual;

The above query gives me single record like below

Column_Name
__________________________
ABCDEF:PmId12345RmLn1VlId0

Can anyone help me solve this issue.


Solution

  • Here is a solution using a recursive factored subquery (Oracle 11.2 and above):

    with inputs ( str ) as (
           select to_clob('ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0')
           from dual
         ),
         prep ( s, n, token, st_pos, end_pos ) as (
           select ',' || str || ',', -1, null, null, 1
             from inputs
           union all
           select s, n+1, substr(s, st_pos, end_pos - st_pos),
                  end_pos + 1, instr(s, ',', 1, n+3)
             from prep
             where end_pos != 0
         )
    select n as idx, token as column_name
    from   prep
    where  n > 0;
    
    
    
       IDX COLUMN_NAME
    ------ ----------------------------
         1 ABCDEF:PmId12345RmLn1VlId0
         2 ABCDEF:PmId12345RmLn1VlId0
         3 ABCDEF:PmId12345RmLn1VlId0
         4 ABCDEF:PmId12345RmLn1VlId0
         5 ABCDEF:PmId12345RmLn1VlId0
    

    Notes:

    You said CLOB but in your example you extracted from a varchar2 string. I added to_clob() to see if/how this works on a CLOB.

    I used instr and substr, as they often (usually?) perform between better and much better than their regexp equivalents.

    I saved the "index" of each substring within the input string; in some cases the order of the tokens in the input string is important. (Not in your example though, you just had the same token repeated five times.)

    If you need better performance, especially if your CLOBs are very large, you may be better off using dbms_lob.substr and dbms_lob.instr - see Performance of SUBSTR on CLOB, especially Alex Poole's answer, and documentation here: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_lob.htm#BABEAJAD. Note the syntax differences vs regular substr / instr.