Search code examples
regexoracle-databaseloopsclob

Oracle query to list all occurrences of a string from CLOB field


I have a CLOB field in my Oracle table and would like to extract all occurrences of the string which matches the pattern 'RES_GetResData_Public_ScreenPrint'.

**Example - **

|RES_GetResData_Public_ScreenPrint011| |RES_GetResData_Public_ScreenPrint023| |RES_GetResData_Public_ScreenPrint086| |RES_GetResData_Public_ScreenPrint100|

I used the below query to get all occurrences of the string 'RES_GetResData_Public_ScreenPrint' in CLOB data for the above-provided example. The query is able to get all 4 occurrences but displays only the first occurrence of the string in all 4 rows.

Query -

select objectid,
REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^|]+', 1,1) column1 from perftest.NRS_DATASOURCEDATA_A_Test  cross join
table(cast(multiset(select level from dual
connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') 
) as sys.odcinumberlist))

Actual Result -

objectid column1
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011

Expected Result -

objectid column1
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint023
12345 RES_GetResData_Public_ScreenPrint086
12345 RES_GetResData_Public_ScreenPrint100

Solution

  • I think you're missing some of your query? But the issue is the second '1' in regexp_substr. Here's the description of the 4th arg from the docs:

    occurrence is a positive integer indicating which occurrence of pattern in source_char Oracle should search for. The default is 1, meaning that Oracle searches for the first occurrence of pattern.

    So regexp_count found 4 instances, but each time regexp_substr gets the first one. You'll want to change it from 1 to level, so the regexp_substr is looking at the same occurrence as regexp_count for each row.

    Without the rest of the query, I can't say for sure, but I think level probably got re-aliased to column_value by the table/cast/multiset statement, so I think that's probably what you want.

    select objectid,

        REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^|]+', 1,column_value) column1
 
    from test_table 
    cross join
 table(cast(multiset(select level from dual
 
                     connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint')
    ... more query, probably? maybe just ")))" ...
    

    As a side note, you can maybe rewrite the table/cast/multiset to a simpler inline view - see this question.

    --EDIT--

    Ok, it sounds like your clob is actually structured data. If you know that there are 3 fields in each row, like this:

    RES_GetResData_Public_ScreenPrintXXX F99999999 (blank)

    then the column_value that you get back from the cross-join will basically be your row number. Your column2 regexp_substr is now just looking for any field - so if you want the second column, it'll be (row number)*(number of columns)+(the column you want, which is 2). In other words:

    (column_value-1)*3+2 will give you column 2. Change the 2 to a 3 if you want column 3.

    If your data actually has 4 columns, you'll need to change the 3 to a 4.

    select objectid, 
        REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
        REGEXP_SUBSTR(DATA, '([^\|]+)', 1, (column_value-1)*3+2) AS column2,
        column_value 
    from test_table 
    cross join 
        table(cast(multiset(select level from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))