Search code examples
oracleconnect-by

Expanding Oracle rows with comma-delimited values into multiple rows


I have a table in Oracle like the following:

KEY,VALS
k1,"a,b"

I need it to look like:

KEY,VAL
k1,a
k1,b

I did this with CONNECT BY and LEVEL, following an example:

with t as (
    select 'k1' as key, 'a,b' as vals
    from dual
)
select key, regexp_substr(vals, '[^,]+', 1, level) as val
from t
connect by LEVEL <= length(vals) - length(replace(vals, ',')) + 1

But when I have multiple rows in the table, and the vals can be comma-delimited values of different lengths, like:

KEY,VALS
k1,"a,b"
k2,"c,d,e"

I'm looking for a result like:

KEY,VAL
k1,a
k1,b
k2,c
k2,d
k2,e

But the naive approach above doesn't work because every level is connected with the one above it, resulting in:

with t as (
    select 'k1' as key, 'a,b' as vals
    from dual
    union
    select 'k2' as key, 'c,d,e' as vals
    from dual
)
select key, regexp_substr(vals, '[^,]+', 1, level) as val
from t
connect by LEVEL <= length(vals) - length(replace(vals, ',')) + 1
KEY,VAL
k1,a
k1,b
k2,e
k2,d
k2,e
k2,c
k1,b
k2,e
k2,d
k2,e

I suspect I need some kind of CONNECT BY PRIOR condition, but I'm not sure what. When trying to match by keys:

connect by prior key = key
       and LEVEL <= length(vals) - length(replace(vals, ',')) + 1

I get an ORA-01436: CONNECT BY loop in user data error.

What's the right approach here?


Solution

  • Option 1: Simple, fast string functions and a recursive query:

    with t (key, vals) as (
        SELECT 'k1', 'a,b'   FROM DUAL UNION ALL
        SELECT 'k2', 'c,d,e' FROM DUAL
    ),
    bounds (key, vals, spos, epos) AS (
      SELECT key, vals, 1, INSTR(vals, ',', 1)
      FROM t
    UNION ALL
      SELECT key, vals, epos + 1, INSTR(vals, ',', epos + 1)
      FROM bounds
      WHERE  epos > 0
    )
    SEARCH DEPTH FIRST BY key SET key_order
    SELECT key,
           CASE epos
           WHEN 0
           THEN SUBSTR(vals, spos)
           ELSE SUBSTR(vals, spos, epos - spos)
           END AS val
    FROM   bounds;
    

    Option 2: Slower regular expressions in a LATERAL joined hierarchical query

    This option requires Oracle 12 or later.

    with t (key, vals) as (
        SELECT 'k1', 'a,b'   FROM DUAL UNION ALL
        SELECT 'k2', 'c,d,e' FROM DUAL
    )
    SELECT key, val
    FROM   t
           LEFT OUTER JOIN LATERAL (
             SELECT regexp_substr(vals, '[^,]+', 1, level) AS val
             FROM   DUAL
             CONNECT BY LEVEL <= REGEXP_COUNT(vals, '[^,]+')
           )
           ON (1 = 1)
    

    Option 3: Recursive query correlating to parent rows.

    This option is the slowest of the options as it needs to correlate between levels of the hierarchy and generate a GUID at each step (which is seemingly useless but prevents unnecessary recursion).

    with t (key, vals) as (
        SELECT 'k1', 'a,b'   FROM DUAL UNION ALL
        SELECT 'k2', 'c,d,e' FROM DUAL
    )
    SELECT key,
           regexp_substr(vals, '[^,]+', 1, level) AS val
    FROM   t
    CONNECT BY LEVEL <= REGEXP_COUNT(vals, '[^,]+')
    AND PRIOR key = key
    AND PRIOR SYS_GUID() IS NOT NULL;
    

    Which all output:

    KEY VAL
    k1 a
    k1 b
    k2 c
    k2 d
    k2 e

    db<>fiddle here