Search code examples
oracle-databaseplsqlsplitclob

How do we split a CLOB (with some lines with more than 32K characters) line by line via PLSQL?


I am trying to split a huge CLOB which contains lines with more than 32K characters.

I tried to use this

SELECT REGEXP_SUBSTR(file_cont, '[^'||chr(10)||']+', 1, LEVEL) AS substr
from data_tab where interface = 'Historical'
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(file_cont, '[^'||chr(10)||']+')) + 1

The table data_tab contains some files with pipe as a separator. The column file_cont is a clob which contains the file we are interested in. However, when I try to execute the above query, it looks like there is an infinite loop.

For information, the CLOB contains more than 600 lines.

What I want to do is to split the clob, line by line into distinct CLOB. Do you know a query that can display this result without falling into an infinite loop?

EDIT : The file's size is 22MB.

Thank you in advance.


Solution

  • You can use a PL/SQL function to read the and split the value:

    If you have the data type:

    CREATE TYPE clob_table AS TABLE OF CLOB;
    

    Then the function:

    CREATE FUNCTION split_clob(
      p_value     IN CLOB,
      p_delimiter IN VARCHAR2 DEFAULT ','
    ) RETURN clob_table PIPELINED
    IS
      v_start  PLS_INTEGER;
      v_next   PLS_INTEGER;
      v_len    PLS_INTEGER;
    BEGIN
      v_start := 1;
      LOOP
        v_next := DBMS_LOB.INSTR( p_value, p_delimiter, v_start );
        v_len  := CASE v_next WHEN 0 THEN LENGTH( p_value ) + 1 ELSE v_next END - v_start;
        PIPE ROW ( SUBSTR( p_value, v_start, v_len ) );
        EXIT WHEN v_next = 0;
        v_start := v_next + LENGTH(p_delimiter);
      END LOOP;
    END;
    /
    

    For the sample data:

    CREATE TABLE table_name ( value CLOB );
    
    DECLARE
      v_value TABLE_NAME.VALUE%TYPE := EMPTY_CLOB();
    BEGIN
      FOR ch IN 65 .. 68 LOOP
        FOR i IN 1 .. 10 LOOP
          v_value := v_value || RPAD( CHR(ch), 4000, CHR(ch) );
        END LOOP;
        IF ch < 68 THEN
          v_value := v_value || CHR(10);
        END IF;
      END LOOP;
      INSERT INTO table_name ( value ) VALUES ( v_value );
    END;
    /
    

    Then the output of:

    SELECT SUBSTR( s.column_value, 1, 10 ) AS value,
           LENGTH( s.column_value ) AS len
    FROM   table_name t
           CROSS APPLY TABLE( split_clob( t.value, CHR(10) ) ) s
    

    Is:

    VALUE LEN
    AAAAAAAAAA 40000
    BBBBBBBBBB 40000
    CCCCCCCCCC 40000
    DDDDDDDDDD 40000

    db<>fiddle here