I am trying to retrieve data from a blob using regexp_substr. Here is how the blob looks like
"select a,b ,upper(c from xyz), d from t1 where xyx;"
Required output is
t1
Instead of where
in the blob, it could be fetch
,order by
or group by
.
I am trying with
REGEXP_SUBSTR(query_blob,'FROM\s(.*?)\s(WHERE|FETCH|ORDER|GROUP)',1,1,'i',1)
but this gives me output as
xyz), d from t1
Could any one please help me to get the table name between from and (where or fetch or group by or order by).
Assuming a simple Oracle SQL query (i.e. without nested/correlated sub-queries) then you can use the regular expression ([a-z][a-z0-9_$#]*|"[^"]+")
to only match the characters in an unquoted or quoted identifier (if you are using other languages that use different quoted identifiers, such as square brackets or backticks, then you can also add those) based on the Database Object Naming rules:
Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.
Nonquoted identifiers can only contain alphanumeric characters from your database character set and the underscore (_), dollar sign ($), and pound sign (#). Database links can also contain periods (.) and "at" signs (@). Oracle strongly discourages you from using $ and # in nonquoted identifiers.
Quoted identifiers can contain any characters and punctuations marks as well as spaces. However, neither quoted nor nonquoted identifiers can contain double quotation marks or the null character (\0).
And follow it with your check for additional filter/aggregation clauses (there could also be an alias [which in other non-Oracle SQL dialects could be preceded by the AS
keyword] and/or a HAVING
clause with no GROUP BY
clause or no filters).
Like this:
SELECT REGEXP_SUBSTR(
query_blob,
'FROM\s+([a-z][a-z0-9_$#]*|"[^"]+")' -- FROM clause
|| '(\s+([a-z][a-z0-9_$#]*|"[^"]+"))?' -- FROM Alias
|| '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
1,
1,
'i',
1
) AS frm
FROM table_name;
Which, for your sample data:
CREATE TABLE table_name (query_blob) AS
SELECT 'select a,b ,upper(c from xyz), d from t1 where xyx;' FROM DUAL
Outputs:
FRM |
---|
t1 |
If you want to make the expression even more complicated then:
SELECT REGEXP_SUBSTR(
query_blob,
'FROM\s+' -- FROM clause
|| '('
|| '([a-z][a-z0-9_$#]*|"[^"]+")' -- Identifier
|| '(\(\s*' -- Start Function call
|| '(' -- Start optional group
|| '[a-z][a-z0-9_$#]*' -- Unquoted identifier
|| '|"[^"]+"' -- Or quoted identifier
|| '|''([^'']|'''')*''' -- Or string literal
|| ')?\s*' -- End optional group
|| '(' -- Start zero-or-more group
|| ',\s*[a-z][a-z0-9_$#]*' -- Unquoted identifier
|| '|,\s*"[^"]+"' -- Or quoted identifier
|| '|,\s*''([^'']|'''')*''' -- Or string literal
|| ')*' -- End zero-or-more group
|| '\s*\))?' -- End Function call
|| ')'
|| '(\s+([a-z][a-z0-9_$#]*|"[^"]+"))?' -- FROM Alias
|| '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
1,
1,
'i',
1
) AS frm
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (query_blob) AS
SELECT 'select a,b ,upper(c from xyz), d from t1 where xyx;' FROM DUAL UNION ALL
SELECT 'select a,b ,upper(c from xyz), d from abc(''xyz'') where xyx;' FROM DUAL UNION ALL
SELECT 'select a,b ,upper(c from xyz), d from abc(def) where xyx;' FROM DUAL UNION ALL
SELECT 'select a,b ,upper(c from xyz), d from abc( def, ''ghi'', jkl ) where xyx;' FROM DUAL
Outputs:
FRM |
---|
t1 |
abc('xyz') |
abc(def) |
abc( def, 'ghi', jkl ) |
Or, if you want a simple expression that may be a 90% solution:
SELECT REGEXP_SUBSTR(
query_blob,
'FROM\s+' -- FROM clause
|| '(\S+)' -- Identifier
|| '(\s+(\S+))?' -- FROM Alias
|| '\s*(\s(WHERE|FETCH|ORDER|GROUP|HAVING)|$)', -- Filter/Aggregation Clauses
1,
1,
'i',
1
) AS frm
FROM table_name;
Which, for the same sample data, matches the first 3 rows but not the final one (as there are spaces inside the brackets of the function call).