Search code examples
sqloracle-databaseplsqlregexp-substr

Split comma separated values with line gaps and nulls into columns in table via pl/sql procedure


I have a string, clob value in table which i need to split into columns . Source table query:

Insert into disp_data(id,data) values(100,
'"Project title as per the outstanding Requirements","The values are not with respect to the requirement and analysis done by the team. 
Also it is difficult to prepare a scenario notwithstanding the fact it is difficult. This user story is going to be slightly complex however it is up to the team","Active","Disabled","25 tonnes of fuel","www.examplesites.com/html.asp&net;","","","","","25"');

In the clob column value there are spaces, null value and line gaps also. So when i try splitting it using

select regexp_substr(data,'[^,]+',1,level) from disp_data 
connect by regexp_substr(data,'[^,]+',1,level) is not null.

Problem is for the large text with line gaps, it is splitting it into different rows. I had thought of using the above result set and pivot but am unable to.

I need to get this data as columns and push in the destination table-push_data_temp.

select pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11 from push_data_temp;

The clob column has 11 comma separated values that need to be pushed into this table as columns. This entire process needs to be done via pl/sql procedure.

The result in push_data_temp should look like this. enter image description here

ANy help would be much appreciated. DB is oracle 19c


Solution

  • Your regular expression needs to allow for nulls, i.e. consecutive commas (but hopefully you don't have commas within any of the quoted strings...). If you have multiple source rows then it's easier to split with a recursive CTE:

    with rcte (id, data, lvl, result) as (
      select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
      from disp_data
      union all
      select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
      from rcte
      where lvl <= regexp_count(data, ',')
    )
    select id, lvl, result
    from rcte
    order by id, lvl;
    

    You can then pivot the result into the columns you want:

    with rcte (id, data, lvl, result) as (
      select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
      from disp_data
      union all
      select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
      from rcte
      where lvl <= regexp_count(data, ',')
    )
    select *
    from (
      select id, lvl, result
      from rcte
    )
    pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));
    

    db<>fiddle

    And you can use that directly in an insert statement:

    insert into push_data_temp (pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11)
    with rcte (id, data, lvl, result) as (
      select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
      from disp_data
      union all
      select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
      from rcte
      where lvl <= regexp_count(data, ',')
    )
    select *
    from (
      select id, lvl, result
      from rcte
    )
    pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));
    

    db<>fiddle

    No PL/SQL needed, but you can still wrap it in a procedure if you want to.


    I have to take as clob and it is throwing error as inconsistent datatype

    You need to cast the tokens as varchar2, which limits their length (either 4k or 32k depending on Oracle version and settings):

    with rcte (id, data, lvl, result) as (
      select id, data, 1,
        cast(regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1) as varchar2(4000))
      from disp_data
      union all
      select id, data, lvl + 1,
        cast(regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1) as varchar2(4000))
      from rcte
      where lvl <= regexp_count(data, ',')
    )
    ...
    

    db<>fiddle with CLOB (and connect-by examples removed, as they break it...)


    when i try for text with commas in between, it splits data unevenly.

    That's why I said "hopefully you don't have commas within any of the quoted strings". As you don't have any really empty elements - you have ...","","... rather than ...,,... - you can skip the concern about those I suppose, and use a different pattern:

    with rcte (id, data, lvl, result) as (
      select id, data, 1,
        cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, 1, null, 1) as varchar2(4000))
      from disp_data
      union all
      select id, data, lvl + 1,
        cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, lvl + 1, null, 1) as varchar2(4000))
      from rcte
      where lvl <= regexp_count(data, '("[^"]*"|[^,]+)')
    )
    ...
    

    db<>fiddle

    If you did have to deal with null elements then it's still possible, but more work. This also won't deal with escaped double-quotes without strings. At some point it will be easier to write your own parser in PL/SQL; or even to write the data to disk and read it back in as an external table which can handle all of this for you.