Reusing MySQL parser

I'm working on SQL intrusion detection system (IDS) and I need do parse incoming SQL queries. Writing own SQL parser is a long term task and it will never exactly reflect the logic used in native parser. I found out that MySQL has a lexical analyzer with main source file sql/sql_lex.cc and a syntax analyzer built with bison from sql/sql_yacc.y. I am really interested in reusing this robust solutions. I am building my IDS in C/C++, so I am looking for some way to connect MySQL parser with my detection system.

I was wondering if It is possible to reuse the MySQL parser (lexical+syntax analyzer) to get the structure of SQL query in some logical form e.g. syntax tree. Would it be possible? Are there some related text, tutorials or projects?

Thanks

Solution

I have finished the first version of my IDS as a part of my bachelor project. It is implemented as plugin for MySQL.

I will list my main sources for understanding the MySQL internals bellow. Then I shortly describe the approach I used in my IDS.

MySQL documentation texts

I found books Expert MySQL by Charles Bell and Understanding MySQL Internals by Sasha Pachev (as user3822447 wrote) to be very good entry point for understanding the internals of MySQL.
MySQL 5.1 Plugin Development by Andrew Hutchings & Sergei Golubchik is also very useful.
The MySQL Internals Manual also contains some basic information good to start with.
After all reading I did som debugging (using VS) and discovered how the query tree structure looks like.

My solution for IDS

The source code of my solution can be found at sourceforge. I'm planning to document it little more in its wiki.

The main entry point is the audit_ids_notify() function in audit_ids.cc. The plugin takes query tree generated by internal MySQL parser a makes simplified version of it (to save memory). Then it does anomally detection - it has a list of known query tree structures and keeps some statistical information about each parametrizable part of each query tree structure. The output is written into special log file in the MySQL data directory.

I tried to make the solution modular and extendable. The initial version is kind of demostration and the performance is not optimized especially in SQL storage module.

MySQL plugin type

I identified 2 possible approaches and used the first one.

audit plugin
- The type of wrapper in my solution plugin is audit plugin.
- I used this type of plugins despite being used to report server operations (e.g. to log queries or errors).
- I chose this type of plugin because I found out that this is the only native supported plugin which is called when the query tree is after the completed (i.e. parsed) and before it is freed from the memory (for MySQL 5.6.17).
- Disadvantage: the above is not fully guaranteed in future versions of MySQL, but in my opinion this shouldn't change in the close future.
- Advantage: the MySQL does not need to recompiled. It is enough to build and install the plugin.

query-rewrite plugin
- There is also an alternative approach doing this using non-native plugin type query-rewrite. It provised plugin API for modifying the query thus also for reading it.
- Disadvantage: to support this plugin API the MySQL server must be recompiled with the API. I think may become a part of MySQL production distribution.
- Advantage: plugin type designed for reading/rewriting the internal query tree.

If there are some questions/problems related to this topic I could answer feel free to ask ;)