Search code examples
sqlregexoracle-databaseplsqloracle12c

How create a function for converting a CSV value into a table, using regular expression as a CSV separator?


I need a generic Oracle function which takes a CSV string as a first parameter and a regular expression string, which defines a CSV separator as a second parameter and returns a table of parsed strings like as follows:

INPUT data:

NAME    PROJECT     ERROR
108     test        string-1, string-2 ; string-3
109     test2       single string
110     test3       ab,  ,c

OUTPUT data:

NAME    PROJECT     ERROR
108     test        string-1
108     test        string-2
108     test        string-3
109     test2       single string
110     test3       ab
110     test3       NULL
110     test3       c

the separators might be different in different source tables, so I'd like to be able to specify them dynamically as a regex.

How can I create a generic function out of the following code:

with temp as
(
    select 108 Name, 'test' Project, 'string-1 , string-2 ; string-3' Error  from dual
    union all
    select 109, 'test2', 'single string' from dual
)
select distinct
  t.name, t.project,
  trim(regexp_substr(t.error, '[^,;]+', 1, levels.column_value))  as error
from 
  temp t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,;]+'))  + 1) as sys.OdciNumberList)) levels
order by name;

sql<>fiddle

So I'm thinking of a function that takes the following parameters and returns a table of strings

CREATE OR REPLACE FUNCTION csvstr2tab(
    p_str      IN VARCHAR2,
    p_sep_re   IN VARCHAR2   DEFAULT '\s*[,;]\s*'
)

PS I have used this answer


UPDATE: please note that I'm using the abbreviation "CSV" here just in order to explain that the input string has multiple values, separated by different separators. I'm dealing with a free text, written by human beings, that used different separators. So the input string doesn't have to be a correct CSV in my case - it's just a string separated by multiple different separators.


Solution

  • Perhaps something like this. I wrote it as a PL/SQL function, as you requested, but note that if the input data resides in the database, this can be done directly in SQL.

    For illustration, I called the function with the default separator.

    If you are not familiar with pipelined table function, you can read about them in the documentation.

    Note also that in Oracle 12.2, but not in 12.1, you can leave out the table( ) operator - you can select directly "from the function".

    create type str_t as table of varchar2(4000);
    /
    
    create or replace function csvstr2tab(
      p_str    in varchar2,
      p_sep_re in varchar2 default '\s*[,;]\s*'
    )
      return str_t
      pipelined
    as
    begin
      for i in 1 .. regexp_count(p_str, p_sep_re) + 1 loop
        pipe row (regexp_substr(p_str, '(.*?)(' || p_sep_re || '|$)', 1, i, null, 1));
      end loop;
      return;
    end;
    /
    
    select *
    from   table(csvstr2tab('blue  ;green,,brown;,yellow;'))
    ;
    
    COLUMN_VALUE
    --------------------
    blue
    green
    [NULL]
    brown
    [NULL]
    yellow
    [NULL]
    

    One more test (note that the first row in the output has two trailing spaces, too):

    select *
    from   table(csvstr2tab('blue  ;green,,brown;,yellow;', ';'))
    ;
    
    COLUMN_VALUE
    -----------------
    blue  
    green,,brown
    ,yellow
    

    EDIT

    Here is how the function can be used to break input strings into tokens when the inputs are in a table (rows identified by an ID, for example), and keep track of token order as well.

    with
      sample_data(id, str) as (
        select 1201, 'blue  ;green,,brown;,yellow;' from dual union all
        select 1202, 'tinker, tailor, soldier, ...' from dual
      )
    select sd.id, sd.str, tf.ord, tf.token
    from   sample_data sd,
           lateral ( select rownum as ord, column_value as token
                     from   table(csvstr2tab(sd.str))
                   ) tf
    order by id, ord
    ;
    
        ID STR                             ORD TOKEN   
    ------ ---------------------------- ------ --------
      1201 blue  ;green,,brown;,yellow;      1 blue    
      1201 blue  ;green,,brown;,yellow;      2 green   
      1201 blue  ;green,,brown;,yellow;      3         
      1201 blue  ;green,,brown;,yellow;      4 brown   
      1201 blue  ;green,,brown;,yellow;      5         
      1201 blue  ;green,,brown;,yellow;      6 yellow  
      1201 blue  ;green,,brown;,yellow;      7         
      1202 tinker, tailor, soldier, ...      1 tinker  
      1202 tinker, tailor, soldier, ...      2 tailor  
      1202 tinker, tailor, soldier, ...      3 soldier  
      1202 tinker, tailor, soldier, ...      4 ...