Search code examples
data-warehousedbt

dbt table coverage metric


Given that I have a data warehouse with various tables being created from various sources many of them by dbt, I want to measure a concept like 'dbt table coverage' which I define as:

dtc = count(tables and views that exist) / count(non ephemeral models and sources)

This would be really useful in order to maintain a sense of quality/completeness, especially during transition projects. Is there a dbt command like:

dbt report table-coverage --schemas=['reporting','example']
>>> 96% coverage, 48/50 tables in the schemas provided are captured in dbt. 

If not, how can we add this to the project?!

What alternate approaches could I take to solving the problem


Solution

  • To do this I would probably create a model (view) that queried the information_schema and made some assumptions about a 1-to-1 mapping of {sourceTableName} to stg_{sourceTableName} (Assuming this means coverage for you).

    Additionally I would look into using the graph.sources.values() JINJA function in order to iterate through all of the documented sources in your project, and then compare that with each of the models in {target.schema}

    https://docs.getdbt.com/reference/dbt-jinja-functions/graph#accessing-sources

    If you're comparing the existence of source.schema.yml with the source.information_schema. I would alter the approach to consider mapping each of the items in the graph against the total count of items in the information_schema on the source database.