Search code examples
dbt

Sharing tests between DBT models


I have a bunch of dbt models that share about 90% of their structure. The idea is that these models will be combined into a single unified downstream model during the dbt run. Currently my tests for the models have a lot of duplication. For example

- name: model1
  columns:
    - name: colA
      tests:
         - accepted_values: 
             - values ['a','b']
    - name: colB
      tests: 
         - non_null 
 
  
- name: model2
  columns:
    - name: colA
      tests:
         - accepted_values: 
             - values ['a','b','c']
    - name: colB
      tests: 
         - non_null 

I'd like to reduce the duplication in schema.yml file by re-using the test config with small variations.

What I have tried so far

  1. defining the tests as a var in dbt_project.yml and referencing it in the schema.yml . This works but you cannot have any variation

  2. defining a macro that returns a python list that has the test config and calling the macro like this

    columns: "{{ common_tests() }}"

This doesn't work as I get could not render {{ common_tests() }} 'common_tests' is undefined.

Interestingly it is possible to render yaml with a macro within individual tests within the yaml file, just not at the top level.

I feel there should be an easy(ish) solution here, I'm just not finding it. Thanks in advance.


Solution

  • If you don’t mind defining all these models in a single .yml file, you can use YAML anchors for this.

    Josh Devlin has a nice write-up here:

    
    version: 2
    
    models:
      - name: model_one
        columns:
          - name: id
            tests: &unique_not_null
              - unique
              - not_null
          - name: col_a
          - name: col_b
      - name: model_two
        columns:
          - name: id
            tests: *unique_not_null
          - name: col_c
          - name: col_d
    

    Josh’s example shows an anchor on the tests key for a single column, but you could also use an anchor on the columns key. That doesn’t work so well though, because even with the merge operator (<<), you would need to repeat everything if there is a single change in a single test. There is no YAML equivalent for repeating lists or list items, which is really what you need here.