Search code examples
sqloracleconnect-by

Connect by prior - is this an Oracle bug?


So either this is an Oracle bug, or I'm being a bit slow today.

This SQL executes fine:

WITH car_paint_options AS (
   SELECT 'Escort'  car_model, 'red,blue' paint_opts FROM dual UNION
   SELECT 'Puma'    car_model, 'black'    paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
       regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM   car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null

However, it gives me the wrong answer (4 lines of data):

 rn level car_model paint_opt
--- ----- --------- ---------
  1     1 Puma      black
  2     1 Focus     red
  3     2 Focus     blue
  4     2 Focus     blue

The desired output is only 3 lines of data as shown here:

 rn level car_model paint_opt
--- ----- --------- ---------
  1     1 Puma      black
  2     1 Focus     red
  3     2 Focus     blue

I understand why the problem occurs. Level 2 records are attempting to connect back to level 1 records, and what's happening is that the Focus:blue option is matching back successfully to both Puma:black and Focus:red parent rows.

So now I'm thinking: "fine, simple enough fix, let's constrain the level 2 objects so that they only ever link back to parent objects of the same car_model":

WITH car_paint_options AS (
   SELECT 'Escort'  car_model, 'red,blue' paint_opts FROM dual UNION
   SELECT 'Puma'    car_model, 'black'    paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
       regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM   car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
       AND ep.car_model = prior ep.car_model

But it results in an error:

ORA-01436: CONNECT BY loop in user data

Tested in both Oracle 12c + 19c. Can someone confirm I'm not doing anything silly here? Am I right in thinking it's a bug?


Solution

  • Option 1

    The normal "hack" for this is to add something to the CONNECT BY clause that gives each row a unique value within a filter that always evaluates to true such as SYS_GUID() IS NOT NULL or DBMS_RANDOM.VALUE() IS NOT NULL and prevents the hierarchical query from detecting cycles:

    WITH car_paint_options (car_model, paint_opts) AS (
       SELECT 'Escort', 'red,blue' FROM dual UNION ALL
       SELECT 'Puma',   'black'    FROM dual
    )
    SELECT row_number() over(order by level) rn, level, ep.car_model,
           regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
    FROM   car_paint_options ep
    CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
    AND        ep.car_model = prior ep.car_model
    AND        PRIOR SYS_GUID() IS NOT NULL
    

    Which outputs:

    RN LEVEL CAR_MODEL PAINT_OPT
    1 1 Escort red
    2 1 Puma black
    3 2 Escort blue

    Option 2

    Regular expression functions are slow and it is often faster to use simple string functions, even if it means you need to type more (and if simple string functions are faster than regular expressions then they are even faster than regular expressions plus generating a GUID for each row):

    WITH car_paint_options (car_model, paint_opts) AS (
       SELECT 'Escort', 'red,blue' FROM dual UNION ALL
       SELECT 'Puma',   'black'    FROM dual
    ),
    bounds (car_model, paint_opts, lvl, spos, epos) AS (
      SELECT car_model,
             paint_opts,
             1,
             1,
             INSTR(paint_opts, ',', 1)
      FROM   car_paint_options
    UNION ALL
      SELECT car_model,
             paint_opts,
             lvl + 1,
             epos + 1,
             INSTR(paint_opts, ',', epos + 1)
      FROM   bounds
      WHERE  epos > 0
    )
    SELECT ROW_NUMBER() OVER (ORDER BY lvl) AS rn,
           lvl,
           car_model,
           CASE epos
           WHEN 0
           THEN SUBSTR(paint_opts, spos)
           ELSE SUBSTR(paint_opts, spos, epos - spos)
           END AS paint_opt
    FROM   bounds
    

    Which outputs the same.

    RN LVL CAR_MODEL PAINT_OPT
    1 1 Escort red
    2 1 Puma black
    3 2 Escort blue

    Option 3

    If you did want to still use regular expressions and want to avoid SYS_GUID hacks then, from Oracle 12, you can generate the level values using a LATERAL join (or CROSS APPLY):

    WITH car_paint_options (car_model, paint_opts) AS (
       SELECT 'Escort', 'red,blue' FROM dual UNION ALL
       SELECT 'Puma',   'black'    FROM dual
    )
    SELECT row_number() over(order by depth) rn,
           depth,
           ep.car_model,
           regexp_substr(ep.paint_opts, '[^,]+', 1, depth) AS paint_opt
    FROM   car_paint_options ep
           CROSS JOIN LATERAL (
             SELECT LEVEL AS depth
             FROM   DUAL
             CONNECT BY LEVEL <= REGEXP_COUNT(ep.paint_opts, '[^,]+')
           )
    

    Which also outputs the same.


    Performance

    A comparison of the performance of different options for splitting delimited strings is given in this answer. The summary is that option 2 tends to be the most performant (and the most to type).

    fiddle