Given that I have a data warehouse with various tables being created from various sources many of them by dbt, I want to measure a concept like 'dbt table coverage' which I define as:
dtc = count(tables and views that exist) / count(non ephemeral models and sources)
This would be really useful in order to maintain a sense of quality/completeness, especially during transition projects. Is there a dbt command like:
dbt report table-coverage --schemas=['reporting','example']
>>> 96% coverage, 48/50 tables in the schemas provided are captured in dbt.
If not, how can we add this to the project?!
What alternate approaches could I take to solving the problem
To do this I would probably create a model (view) that queried the information_schema and made some assumptions about a 1-to-1 mapping of {sourceTableName}
to stg_{sourceTableName}
(Assuming this means coverage for you).
Additionally I would look into using the graph.sources.values()
JINJA function in order to iterate through all of the documented sources in your project, and then compare that with each of the models in {target.schema}
https://docs.getdbt.com/reference/dbt-jinja-functions/graph#accessing-sources
If you're comparing the existence of source.schema.yml
with the source.information_schema
. I would alter the approach to consider mapping each of the items in the graph against the total count of items in the information_schema on the source database.