Search code examples
dbt

Good way to use/design dbt relation test


I would like to have a dbt relation test for a model m_0.

models:
  - name: m_0
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('m_1')
              field: id

Then I got two questions:

  1. I'm running the model m_0 under schema s_a, how to handle the case that m_1 does not exist? There's no guarantee of m_1 must be run before m_0.
  2. How to guarantee the relation test result is not flaky? Since m_0 and m_1 are scheduled independently, some latest id may not read from m_1 but are already in m_0.

Any good practice for writing a dbt relation test? Is there any requirements of m_0 and m_1? Such as m_1 should be a parent table of m_0?


Solution

    1. The test is its own node, which will be downstream in the DAG from both m_0 and m_1. If you run a command like dbt build, dbt will first build m_0 and m_1 before running this relationship test. However, you are right that if you run dbt test -s m_0 and m_1 is not yet built, the test will fail. If you want to avoid this failure, you can use dbt test -s m_0 --indirect-selection=cautious and the relationship test will not run. See the docs

    2. You should consider adding a where: config to this test, that limits the records tested in m_0. For example, where: created_at < current_timestamp - interval '1 day'. This should leave "extra" records in m_1, but this test is only concerned about missing records in m_1, so it will pass