Search code examples
oracle19cregexp-substr

regexp_substr literal max length limit in Oracle


I need to extract values delimited by semicolon in a text. I use regexp_substr which is very powerful but the string has a limit. I have sometimes a text with 10 000 characters so is not possible to use such string in this inbuild function.

the text looks like this - AU783 343; 3N9493 ;113 UN9 WE3 54 ; OI8343, ; 43U.783.3

just 0-9 and A-Z , if there is a colon or period then it should be deleted. the output must be as it is but withou leading and trailing spaces

AU783 343
3N9493
113 UN9 WE3 54
OI8343
43U7833

any suggestion how to avoid the max length limit?


Solution

  • You don't need to use regular expressions. The simple string functions SUBSTR, INSTR and REPLACE are sufficient and work with CLOB values.

    For example, given the table:

    CREATE TABLE table_name (
      id    NUMBER
            GENERATED ALWAYS AS IDENTITY
            PRIMARY KEY,
      value CLOB
    );
    

    You can extract all the substrings using:

    WITH bounds (id, value, s_pos, e_pos) AS (
      SELECT id,
             value,
             1,
             INSTR(value, ';', 1)
      FROM   table_name
    UNION ALL
      SELECT id,
             value,
             e_pos + 1,
             INSTR(value, ';', e_pos + 1)
      FROM   bounds
      WHERE  e_pos > 0
    )
    SEARCH DEPTH FIRST BY id SET id_order
    SELECT id,
           REPLACE(
             REPLACE(
               CASE e_pos
               WHEN 0
               THEN SUBSTR(value, s_pos)
               ELSE SUBSTR(value, s_pos, e_pos - s_pos)
               END,
               ':'
             ),
             '.'
           ) AS value
    FROM   bounds;
    

    See db<>fiddle for an example with very large strings.