Search code examples
amazon-web-servicesamazon-redshiftdocumentation

Automatically generate data documentation in the Redshift cluster


I am trying to automatically generate a data documentation in the Redshift cluster for all the maintained data products, but I am having trouble to do so.

Is there a way to fetch/store metadata about tables/columns in redshift directly?

Is there also some automatic way to determine what are the unique keys in a Redshift table?

For example an ideal solution would be to have:

  • Table location (cluster, schema, etc.)
  • Table description (what is the table for)
  • Each column's description (what is each column for, data type, is it a key column, if so what type, etc.)
  • Column's distribution (min, max, median, mode, etc.)
  • Columns which together form a unique entry in the table

I fully understand that getting the descriptions automatically is pretty much impossible, but I couldn't find a way to store the descriptions in redshift directly, instead I'd have to use 3rd party solutions or generally a documentation outside of the SQL scripts, which I'm not a big fan of, due to the way the data products are built right now. Thus having a way to store each table's/column's description in redshift would be greatly appreciated.


Solution

  • Amazon Redshift has the ability to store a COMMENT on:

    • TABLE
    • COLUMN
    • CONSTRAINT
    • DATABASE
    • VIEW

    You can use these comments to store descriptions. It might need a bit of table joining to access.

    See: COMMENT - Amazon Redshift