I want to annotate my DDL commands with metadata for mutiple data stores' SQL like PostgreSQL, SnowflakeSQL...
ex:
CREATE TABLE dummy (col_name type ... `*metadata_annotation*`, ...);
And I also want to make it mandatory that every column has a metadata annotation.
Later I want to remove these annotaions and create a map out of it and restore the original DDL SQL commands.
ex_map:
table: [col_name: metadata,...]
stripped out sql:
CREATE TABLE dummy (col_name type ... , ...);
I thought of making changes to the SQL grammar in antlr and configure everything through the parse tree visitor.
Or having a regex that can do the same, as there are no unambigious grammar available.
The SQL standard includes nothing for storing metadata.
You can either:
With your own defined syntax that you'll parse, inside standard SQL comments:
CREATE TABLE dummy (col_name type ... /*@metadata_annotation*/, ...);
but you've got not guaranty that the RDBMS will retain them (in information_schema
for example).
It is a case by case (RDBMS-by-RDBMS) solution, unless you control the DDL workflows.
There too, define your own naming.
Example in PostgreSQL:
CREATE TABLE dummy (col_name type ..., ...);
COMMENT ON COLUMN dummy.col_name IS '@metadata_annotation';
There you are even more tight to the RDBMS you want to run on.
CREATE TABLE my_annotations (s text, t text, c text, comment text);
INSERT INTO my_annotations VALUES ('my_schema', 'dummy', 'col_name', 'metadata_annotation');
This has the benefits of:
Note that you can combine this solution (for storing) and solution 1 (for annotating), if you control the DDL workflow and can intercept annotations in the SQL to transform them to INSERT
s to your metaschema.
This would allow to validate the DDL SQL (any unannotated DDL would be rejected), but requires an extensive parser (not only on CREATE TABLE
, but on ALTER TABLE
too, be it to modify columns or to add new ones).
On the other hand, the declarative way in a custom schema-describing table necessarily decorrelates the annotations from the creation,
but you can run your validator (ensure that each column of information_schema.columns
has a corresponding entry in your metaschema table) after each DDL file is played, and even on a regular basis (to ensure nothing bypassed your DDL workflow, creating columns either manually or via a store proc you couldn't catch).
Of course your metainformation will concentrate on functional annotations,
not technical ones that you can infer from the RDBMS-provided catalog,
like types and even constraints (information_schema.key_column_usage
on PostgreSQL, MySQL, SQL Server and so on, all_cons_columns
on Oracle).
If you really need to cross functional annotations to technical reality,
either have a view that aggregates both,
or, if you get them dedicated entries in your metaschema (for example you need to dump your DB with all its description in order to later restore it on another RDBMS where the description will serve to reconstruct all constraints in that new RDBMS' specific DDL),
make frequent consistency checks between the stored version and the database's catalog views.