Search code examples
sqloracle

Oracle SQL query using regexp_substr to get string between 2 strings


I am trying to retrieve data from a blob using regexp_substr. Here is how the blob looks like

"select a,b ,upper(c from xyz), d from t1 where xyx;"

Required output is

t1

Instead of where in the blob, it could be fetch ,order by or group by.

I am trying with

REGEXP_SUBSTR(query_blob,'FROM\s(.*?)\s(WHERE|FETCH|ORDER|GROUP)',1,1,'i',1)

but this gives me output as

xyz), d from t1

Could any one please help me to get the table name between from and (where or fetch or group by or order by).


Solution

  • Assuming a simple Oracle SQL query (i.e. without nested/correlated sub-queries) then you can use the regular expression ([a-z][a-z0-9_$#]*|"[^"]+") to only match the characters in an unquoted or quoted identifier (if you are using other languages that use different quoted identifiers, such as square brackets or backticks, then you can also add those) based on the Database Object Naming rules:

    1. Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.

    2. Nonquoted identifiers can only contain alphanumeric characters from your database character set and the underscore (_), dollar sign ($), and pound sign (#). Database links can also contain periods (.) and "at" signs (@). Oracle strongly discourages you from using $ and # in nonquoted identifiers.

      Quoted identifiers can contain any characters and punctuations marks as well as spaces. However, neither quoted nor nonquoted identifiers can contain double quotation marks or the null character (\0).

    And follow it with your check for additional filter/aggregation clauses (there could also be an alias [which in other non-Oracle SQL dialects could be preceded by the AS keyword] and/or a HAVING clause with no GROUP BY clause or no filters).

    Like this:

    SELECT REGEXP_SUBSTR(
             query_blob,
             'FROM\s+([a-z][a-z0-9_$#]*|"[^"]+")'            -- FROM clause
             || '(\s+([a-z][a-z0-9_$#]*|"[^"]+"))?'          -- FROM Alias
             || '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
             1,
             1,
             'i',
             1
           ) AS frm
    FROM   table_name;
    

    Which, for your sample data:

    CREATE TABLE table_name (query_blob) AS
    SELECT 'select a,b ,upper(c from xyz), d from t1 where xyx;' FROM DUAL
    

    Outputs:

    FRM
    t1

    fiddle


    If you want to make the expression even more complicated then:

    SELECT REGEXP_SUBSTR(
             query_blob,
             'FROM\s+'                                       -- FROM clause
             || '('
             ||   '([a-z][a-z0-9_$#]*|"[^"]+")'              -- Identifier
             ||   '(\(\s*'                                   -- Start Function call
             ||   '('                                        -- Start optional group
             ||   '[a-z][a-z0-9_$#]*'                        -- Unquoted identifier
             ||   '|"[^"]+"'                                 -- Or quoted identifier
             ||   '|''([^'']|'''')*'''                       -- Or string literal
             ||   ')?\s*'                                    -- End optional group
             ||   '('                                        -- Start zero-or-more group
             ||   ',\s*[a-z][a-z0-9_$#]*'                    -- Unquoted identifier
             ||   '|,\s*"[^"]+"'                             -- Or quoted identifier
             ||   '|,\s*''([^'']|'''')*'''                   -- Or string literal
             ||   ')*'                                       -- End zero-or-more group
             ||   '\s*\))?'                                  -- End Function call
             || ')'
             || '(\s+([a-z][a-z0-9_$#]*|"[^"]+"))?'          -- FROM Alias
             || '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
             1,
             1,
             'i',
             1
           ) AS frm
    FROM   table_name;
    

    Which, for the sample data:

    CREATE TABLE table_name (query_blob) AS
    SELECT 'select a,b ,upper(c from xyz), d from t1 where xyx;' FROM DUAL UNION ALL
    SELECT 'select a,b ,upper(c from xyz), d from abc(''xyz'') where xyx;' FROM DUAL UNION ALL
    SELECT 'select a,b ,upper(c from xyz), d from abc(def) where xyx;' FROM DUAL UNION ALL
    SELECT 'select a,b ,upper(c from xyz), d from abc( def, ''ghi'', jkl ) where xyx;' FROM DUAL
    

    Outputs:

    FRM
    t1
    abc('xyz')
    abc(def)
    abc( def, 'ghi', jkl )

    Or, if you want a simple expression that may be a 90% solution:

    SELECT REGEXP_SUBSTR(
             query_blob,
             'FROM\s+'                                       -- FROM clause
             || '(\S+)'                                      -- Identifier
             || '(\s+(\S+))?'                                -- FROM Alias
             || '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
             1,
             1,
             'i',
             1
           ) AS frm
    FROM   table_name;
    

    Which, for the same sample data, matches the first 3 rows but not the final one (as there are spaces inside the brackets of the function call).

    fiddle