I have a bunch of files with sql log. I'm looking out to extract all occurrences of the following pattern
SQL log has sql that looks something like this
sel *
from DB.T1;
update DB.T1;
delete from DB.T2;
collect stats on
DB.T3 index (a,b,c);
sel count(*) from Db.T1;
sel count(*) from db . T2;
sel count(*) from db.t2;
I want to scan through the files starting with logs_ and extract all the unique tables followed by the string DB./db./Db./dB. As you can see there is white space after db in few instances
The output I'm expecting is a deduped list T1, T2, T3
I'm on Mac OS X.
This is what I was able to get. I could not get past this
grep -o -i 'tb.*\Z' *logs_* | uniq
This gives empty results. I was using \Z as I want till the end of the string (and not end of the line)
Need help to build the right command.
Something like:
grep -E -o -i 'DB ?\. ?[A-Z0-9$_]+' | cut -d . -f 2 | tr -d ' ' | sort -u
\Z is not supported by grep, as far as I can tell. And in languages that do support it, it really means until the end of the string, not the end of some "word" in the string. So you need to explicitly match the table name in your grep.
I use -E to use grep's extended regular expressions, which makes +
and ?
recognized as regex metacharacters. This isn't absolutely necessary; you could leave off the -E and use \+
and \?
instead.
The regular expression DB ?\. ?[A-Z0-9$_]+
(or DB \?\. \?[A-Z0-9$_]\+
if you leave off the -E flag) matches:
the literal characters "DB" (case insensitively, because of -i)
an optional space
a literal "."
an optional space
one or more of any ascii letters, digits, $ or _ (the characters that can appear in an unquoted mysql table name)
cut removes the database name, tr removes spaces before the table name, and sort -u returns just the unique table names. (uniq by itself does not do this; it only removes lines that are duplicates of the previous line, so only would have done what you want if you sorted first.)