Search code examples
sqloracle-databasecross-joinrownumregexp-replace

How to create new records from a single record with a comma delimited value field using SQL (in Oracle Database)


Suppose I have a single record in Table test_tabfile, with default_sort field contains comma delimited values:

table_name     schema_name     default_sort
------------   -------------   -------------------------------------- 
Population     2017            GEOTYPE, STATE, COUNTY, ZIP, MSA, CSA

The following legacy SQL script separates values in the default_sort field and creates new records with distinct values field, index_column. How does it achieve this?

I am trying to understand it and rewrite it more intuitively. Thanks.

SELECT schema_name AS schemaname,
       table_name AS tablename,
       REGEXP_SUBSTR (default_sort, '[^,]+', 1, rn) AS index_column
  FROM test_tabfile
       CROSS JOIN
       (    SELECT ROWNUM rn
              FROM (SELECT   MAX (LENGTH (REGEXP_REPLACE (default_sort, '[^,]+'))) + 1 mx
                      FROM test_tabfile)
        CONNECT BY LEVEL <= mx);

Here are the sample output data:

Schmemaname  tablename    index_column
-----------  -----------  ------------
2017         Population   GEOTYPE
2017         Population   STATE
2017         Population   COUNTY
2017         Population   ZIP
2017         Population   MSA
2017         Population   CSA

Solution

  • CONNECT BY is 'intended' to be used to navigate a hierarchy, eg

    select ...
    from   EMP
    start with MGR is null  -- ie, the CEO
    connect by prior MGR = EMPNO   -- ie, link an employee to his manager
    

    but a while ago, people worked it could be used to traverse a "fictional" hierarchy, by

    • realising that START WITH is optional
    • giving an 'always true' condition for connecting

    So we could take the above and do

    select ...
    from   EMP
    connect by 1 = 1 
    

    and you would 'walk' a hierarchy forever! So then we could morph that into:

    select rownum
    from   dual
    connect by 1=1
    

    and get the integers 1,2,3, ... to infinity. And from there, we just add a terminating condtiion, eg

    select rownum
    from   dual
    connect by level <= 10
    

    because 'level' goes up each time we traverse the "hierarchy"

    And if you have integers (1,2,3,4...) then they can be used to dig into strings, eg "find the 1st character, find the 2nd character" or "find the 1st word, find the 2nd word" etc ....

    I've done a video on the topic here

    https://www.youtube.com/watch?v=UonikfFgEyM