I have a CLOB field in my Oracle table and would like to extract all occurrences of the string that matches the pattern 'RES_GetResData_Public_ScreenPrint' and its subsequent values (which are separated by '|' ).
Sample Data-
|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|
For the above-provided example, I used the below query to get all occurrences of the string 'RES_GetResData_Public_ScreenPrint' and its subsequent values (separated by | ) from the CLOB data.
The query is able to get all the occurrences of 'RES_GetResData_Public_ScreenPrint', but returns only the first subsequent value in all the rows.
**Query - **
select objectid,
REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1,
REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))
Actual Result -
objectid | column1 | column2 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028079 |
Expected Result -
objectid | column1 | column2 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028081 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028007 |
I'm new to writing queries and any suggestion would be helpful.
Thank you !!
You do not need regular expression for this; although it is more to type you will probably find that simple string functions are faster than regular expressions:
WITH line_bounds (objectid, data, spos, epos) AS (
SELECT objectid,
data,
1,
INSTR(data, CHR(10), 1)
FROM test_table
UNION ALL
SELECT objectid,
data,
epos + 1,
INSTR(data, CHR(10), epos + 1)
FROM line_bounds
WHERE epos > 0
)
SEARCH DEPTH FIRST BY objectid SET orderid,
lines (objectid, line) AS (
SELECT objectid,
CASE epos
WHEN 0
THEN SUBSTR(data, spos)
ELSE SUBSTR(data, spos, epos -spos)
END
FROM line_bounds
),
match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
SELECT objectid,
line,
INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
FROM lines
)
SELECT objectid,
SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
FROM match_bounds
WHERE res_spos > 0;
Which, for the sample data:
CREATE TABLE test_table (objectid, data) AS
SELECT 12345,
EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
FROM DUAL;
Outputs:
OBJECTID | COLUMN1 | COLUMN2 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028081 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028007 |
If you did want to use regular expressions (please compare the two solutions performance on your data) then you can match the entire pattern and extract the value of the capturing groups:
select objectid,
REGEXP_SUBSTR(
DATA,
'\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
1,
column_value,
NULL,
1
) AS column1,
REGEXP_SUBSTR(
DATA,
'\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
1,
column_value,
NULL,
2
) AS column2
from test_table
cross join table(
cast(
multiset(
select level
from dual
connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
) as sys.odcinumberlist
)
)
Which has the same output.