Search code examples
regexregex-lookarounds

Regex to separate string using a dot before a set of words


I have a set of SQL queries that are badly formatted. I need to separate the dataset names and table names from these queries. Below is an example of my data.

I need to separate the dataset and table names using the dot in the string. This dot search should begin before the keywords "from, join, drop table, into"

The input data looks something like this - | query | |:---- | |select "a"."col_1" , "b"."col_2" from "advp"."accounts" a inner join "advp"."finance" b | |drop table "gqd"."customers"| |insert into "cvb"."stores" a where "a"."cols" = "some value"| |drop table "dataset"."table_name"|

the output data should look like this

query dataset table
select "a"."col_1" , "b"."col_2" from "advp"."accounts" a inner join "advp"."finance" b advp accounts
advp finance
drop table "gqd"."customers" gqd customers
insert into "cvb"."stores" a where "a"."cols" = "some value" cvb stores
drop table "dataset"."table_name" dataset table_name

I tried with the following approach in SQL but could figure out to do a positive lookup for those words -

select query, Regrex_extract_all(query, r'([a-zA-Z0-9]+)\"\.\"') as dataset from table


Solution

  • You can extract dataset and table from group1 and group2 from following regex,

    (?:from|join|drop table|into)\s*"([^"]+)"\."([^"]+)"
    

    Demo

    Explanation:

    • (?:from|join|drop table|into) - Match any of the keywords you mentioned in your post.
    • \s* - Optionally match one or more white spaces
    • "([^"]+)"\."([^"]+)" - Match the dataset and table contained in doublequotes and place them in group1 and group2