Search code examples
testingcommand-line-interfacedata-sciencecommand-line-tooldbt

Exposures in DBT


I'm fairly new in DBT and trying to explore how to exposures. I've already read the documentation ( https://docs.getdbt.com/docs/building-a-dbt-project/exposures ), but I do not feel that I get the answers to my questions.

I'm well aware of the concept that you create an exposures file in your models' folder, then you declare the table name and the other tables/sources that it depends on.

Q1 - Should I state the whole downstream of tables or just the direct tables that it depends on?

Q2 - What exact benefit does it do? Can you come up with a specific scenario?

Q3 - what the purpose of dbt run -m exposure:name and dbt test -m exposure:name? Is it testing the model or the exposure?

I've done exactly what they say in the documentation, I just do not get how I can use it.

Thank you in advance :-)


Solution

  • I’m not an expert in exposures but I hope my answer can give you some directions.

    Q1 - As far I’m aware you just need to specify the direct tables that it depends on. dbt would automatically handle the downstream references. It’s important to make sure that all your models and sources are properly configured and that you are using the ref and source function when referencing them. This is how dbt track the nodes and dependencies to generate the DAG for the documentation.

    Q2 - One of the benefits of having exposure is that it improves your documentation and helps the team to understand how the data flow through the reporting/dashboard. Let’s say the business users asked for new requirements or changes need to be done in the dashboard, the analyst can easily go to the exposure and see all the dependencies, and the code that the dashboard is using and from there can make a fast decision and move the requirements to the ETL team or whatever. Another example could be related to refresh. Imagine you are working in a serie of objects from the same context or tag, for instance, project, and you need to refresh only the objects from the project scope that are being used in a specific dashboard. To achieve that, you can run the dbt command only for that exposure.

    Q3 - The purpose of those commands is to run and test only the models and references of a particular exposure. You can think about this as a different way for tagging reporting objects or whatever were declared in the exposure. It can be really useful for some cases.

    Hope that helps, thanks!