I'm working on SQL intrusion detection system (IDS) and I need do parse incoming SQL queries. Writing own SQL parser is a long term task and it will never exactly reflect the logic used in native parser.
I found out that MySQL has a lexical analyzer with main source file sql/sql_lex.cc
and a syntax analyzer built with bison from sql/sql_yacc.y
. I am really interested in reusing this robust solutions. I am building my IDS in C/C++, so I am looking for some way to connect MySQL parser with my detection system.
I was wondering if It is possible to reuse the MySQL parser (lexical+syntax analyzer) to get the structure of SQL query in some logical form e.g. syntax tree. Would it be possible? Are there some related text, tutorials or projects?
Thanks
I have finished the first version of my IDS as a part of my bachelor project. It is implemented as plugin for MySQL.
I will list my main sources for understanding the MySQL internals bellow. Then I shortly describe the approach I used in my IDS.
The source code of my solution can be found at sourceforge. I'm planning to document it little more in its wiki.
The main entry point is the audit_ids_notify()
function in audit_ids.cc
. The plugin takes query tree generated by internal MySQL parser a makes simplified version of it (to save memory). Then it does anomally detection - it has a list of known query tree structures and keeps some statistical information about each parametrizable part of each query tree structure. The output is written into special log file in the MySQL data directory.
I tried to make the solution modular and extendable. The initial version is kind of demostration and the performance is not optimized especially in SQL storage module.
I identified 2 possible approaches and used the first one.
If there are some questions/problems related to this topic I could answer feel free to ask ;)