I am trying to extract table names from a SQL script. I have the following string:
from db1.tableX tbx1 --quantity table
inner join tableY tbx2 on tbx1.xyz=tbx2.xyz
I don't want to match this string because tabley
is not prefixed with a database name. The way I am trying to do this is by detecting the word "on" in the string before the database name.
My regex is matching tablex
but also matches tbx1
, which I dont want my expression to match.
I would like it to match only in this situation.
from db1.tableX tbx1 --quantity table
inner join db1.tableY tbx2 on tbx1.xyz = tbx2.xyz
My regex should give me tableX
and tableY
because they are prefixed by a database name.
Here is my expression:
(insert\s+into|from|inner\s+join|left\s+outer\s+join|join)\s+[\sa-zA-Z0-9_$#-]*\.\s*(?<table>[a-zA-Z0-9_]+)(?=\s+[a-zA-Z0-9_$#-]+)*
You can simplify your expression
(?<=\b(from|join)\s+[a-zA-Z0-9_$#-]*\.\s*)[a-zA-Z0-9_]+
It will yield the table name directly, without having to use named groups. I am using the pattern
(?<=prefix)find
This returns only the part "find" as match value. As prefix I use "from" or "join" followed by the database name the dot and possibly spaces.
\b
is the beginning or end of a word.
The last part [a-zA-Z0-9_]+
is the table name.
It makes no difference whether you have an inner join, outer join etc., so I dropped this distinction.
Comments are ugly to detect, beacuse they can contain about anything and can appear about everywhere. Also you have two types of comments -- Until end of line
and \* ... *\
.
You could try to detect comments and spaces like this
(\s|--.*?$|\\\*.*?\*\\)+
Note that the backspace and the star have to be escaped, so \*
becomes \\\*
and *\
becomes \*\\
.
$
denotes the end of line. The ?
after .*
ensures that not several comments are skipped at once, thus skipping meaningful text between them.
This will work in most cases; however, comment like structures could be included within string literals: SELECT 'hello -- world' AS greeting FROM ...
. This is not a comment! Without a full syntax analysis of the SQL text you will not be able to cover all possible cases. But you cannot do this with Regex. Regex has its limits.