I have a set of SQL queries that are badly formatted. I need to separate the dataset names and table names from these queries. Below is an example of my data.
I need to separate the dataset and table names using the dot in the string. This dot search should begin before the keywords "from, join, drop table, into"
The input data looks something like this - | query | |:---- | |select "a"."col_1" , "b"."col_2" from "advp"."accounts" a inner join "advp"."finance" b | |drop table "gqd"."customers"| |insert into "cvb"."stores" a where "a"."cols" = "some value"| |drop table "dataset"."table_name"|
the output data should look like this
query | dataset | table |
---|---|---|
select "a"."col_1" , "b"."col_2" from "advp"."accounts" a inner join "advp"."finance" b | advp | accounts |
advp | finance | |
drop table "gqd"."customers" | gqd | customers |
insert into "cvb"."stores" a where "a"."cols" = "some value" | cvb | stores |
drop table "dataset"."table_name" | dataset | table_name |
I tried with the following approach in SQL but could figure out to do a positive lookup for those words -
select query, Regrex_extract_all(query, r'([a-zA-Z0-9]+)\"\.\"') as dataset from table
You can extract dataset and table from group1 and group2 from following regex,
(?:from|join|drop table|into)\s*"([^"]+)"\."([^"]+)"
Explanation:
(?:from|join|drop table|into)
- Match any of the keywords you mentioned in your post.\s*
- Optionally match one or more white spaces"([^"]+)"\."([^"]+)"
- Match the dataset and table contained in doublequotes and place them in group1 and group2