Search code examples
dbt

How to pass --depends_on flag on dbt test


I have a dbt test that uses a macro with a ref embedded liked this:

{% macro alpha_art_creation_proposal_status_history() %}

    final AS (
                            SELECT
                                id AS proposal_id,
                                order_item_id,
                                LAG(customer_message_at) OVER (PARTITION BY order_item_id ORDER BY customer_message_at) AS prior_message_at,
                                customer_message_at AS message_at
                            FROM
                                {{ref('alpha_order_item_briefing')}}
),
{% endmacro %}

I also have a generic_test that uses that macro. I use this test to check if the column of my silver layer has the same sum value as the column of the bronze layer. Here follows my test:

{% test expect_column_int_sum_to_equal_other_table(model, other_table, column_name, column_other_table, derived_table=none) %}

{{derived_table if derived_table is not none}}

SELECT 1
GROUP BY 1
HAVING (
    (SELECT sum({{column_other_table}}) FROM
        {% if derived_table is not none %}
            final
        {% else %}
            {{other_table}}
        {% endif %}
    ) -

    (SELECT sum({{column_name}}) FROM {{model}})
) != 0

{% endtest %}

My macro is passed through the variable "derived" and is optional. The derived variable is used when some transformation of the bronze table is required to migrate to the silver layer. At the tests.yml I have the following structure:

version: 2

models:
  - name: alpha_art_creation_proposal_status_history
    columns:
      - name: id
        tests:
          - dbt_expectations.expect_column_to_exist
          - dbt_expectations.expect_column_values_to_be_of_type:
              column_type: integer
          - expect_column_int_sum_to_equal_other_table:
              derived_table: "{{alpha_art_creation_proposal_status_history}}"
              column_other_table: id

This code doesn't execute as expected. I got the following errors:

dbt was unable to infer all dependencies for the model "expect_table_row_count_to_equal_derived_table_alpha_art_creation_proposal_status_history__alpha_art_creation_proposal_status_history_"

This typically happens when ref() is placed within a conditional block.

-- depends_on: {{ ref('alpha_order_item_briefing') }}

These messages sound pretty clear to me. However, I'm finding myself to solve this error properly. What I've tried so far:

  • Add a variable depends_on to my macro function and pass this command (-- depends_on: {{ ref('alpha_order_item_briefing') }}) through that. Didn't work.
  • Add the -- depends_on: {{ ref('alpha_order_item_briefing') }} inside my macro. Didn't work.
  • Pass the (-- depends_on: {{ ref('alpha_order_item_briefing') }}) in the derived_table together with the table ("derived_table: -- depends_on: {{ ref('alpha_order_item_briefing') }} "{{alpha_art_creation_proposal_status_history}}"). It worked!

Although I found a solution, it isn't the most suitable one. If I had had too many dependencies, my string would be a monster. So, how to solve this problem? Thanks in advance.


Solution

  • I found a solution, although not sure if it's the best one.

    First, I added (again) a dependency variable in my test function. Here is the code updated:

    {% test expect_column_int_sum_to_equal_other_table(model, other_table, column_name, column_other_table, dependencies=none, derived_table=none) %}
    
    {% if dependencies is not none %}
        {% for dependency in dependencies %}
            {{dependency}}
        {% endfor %}
    {% endif %}
    
    {{derived_table if derived_table is not none}}
    
    SELECT 1
    GROUP BY 1
    HAVING (
        (SELECT sum({{column_other_table}}) FROM
            {% if derived_table is not none %}
                final
            {% else %}
                {{other_table}}
            {% endif %}
        ) -
    
        (SELECT sum({{column_name}}) FROM {{model}})
    ) != 0
    
    {% endtest %}
    

    Now I can pass a list of dependencies, if needed, to my test function. Then I pass my dependencies as a list in my yaml file. The code is somewhat like this:

    version: 2
    
    models:
      - name: alpha_art_creation_proposal_status_history
        columns:
          - name: id
            tests:
              - dbt_expectations.expect_column_to_exist
              - dbt_expectations.expect_column_values_to_be_of_type:
                  column_type: integer
              - expect_column_int_sum_to_equal_other_table:
                  dependencies: ["-- depends_on: {{ ref('alpha_order_item_briefing') }}"]
                  derived_table: "{{alpha_art_creation_proposal_status_history}}"
                  column_other_table: id
    

    This worked fine for me. Hope it helps someone.

    PS: Tested in dbt 1.05. I'm open to hear better solutions.