Suppose I have a single record in Table test_tabfile, with default_sort field contains comma delimited values:
table_name schema_name default_sort
------------ ------------- --------------------------------------
Population 2017 GEOTYPE, STATE, COUNTY, ZIP, MSA, CSA
The following legacy SQL script separates values in the default_sort field and creates new records with distinct values field, index_column. How does it achieve this?
I am trying to understand it and rewrite it more intuitively. Thanks.
SELECT schema_name AS schemaname,
table_name AS tablename,
REGEXP_SUBSTR (default_sort, '[^,]+', 1, rn) AS index_column
FROM test_tabfile
CROSS JOIN
( SELECT ROWNUM rn
FROM (SELECT MAX (LENGTH (REGEXP_REPLACE (default_sort, '[^,]+'))) + 1 mx
FROM test_tabfile)
CONNECT BY LEVEL <= mx);
Here are the sample output data:
Schmemaname tablename index_column
----------- ----------- ------------
2017 Population GEOTYPE
2017 Population STATE
2017 Population COUNTY
2017 Population ZIP
2017 Population MSA
2017 Population CSA
CONNECT BY is 'intended' to be used to navigate a hierarchy, eg
select ...
from EMP
start with MGR is null -- ie, the CEO
connect by prior MGR = EMPNO -- ie, link an employee to his manager
but a while ago, people worked it could be used to traverse a "fictional" hierarchy, by
So we could take the above and do
select ...
from EMP
connect by 1 = 1
and you would 'walk' a hierarchy forever! So then we could morph that into:
select rownum
from dual
connect by 1=1
and get the integers 1,2,3, ... to infinity. And from there, we just add a terminating condtiion, eg
select rownum
from dual
connect by level <= 10
because 'level' goes up each time we traverse the "hierarchy"
And if you have integers (1,2,3,4...) then they can be used to dig into strings, eg "find the 1st character, find the 2nd character" or "find the 1st word, find the 2nd word" etc ....
I've done a video on the topic here